Running a high-performance pluggable transports Tor bridge

David Fifield <david@bamsoftware.com>
Linus Nordberg <linus@nordberg.se>

2023-07-10

https://www.bamsoftware.com/talks/foci-2023-pt-bridge/

https://www.bamsoftware.com/papers/pt-bridge-hiperf/

Overview

Tor (the current C implementation) has trouble scaling on one relay.
It is not a problem for most relays, which do not get enough traffic for it to matter.
But the problem is well-known to exit relay operators, and also affects certain pluggable transports bridges.
The Arti reimplementation of Tor will, in the future, make the scaling problem go away.
This talk is about a way to overcome the Tor bottleneck now, for those bridges that need it.

Direct-access transports

"Design of a blocking-resistant anonymity system", 2006:

Today, Tor relays operate on a few thousand distinct IP addresses; an adversary could enumerate and block them all with little trouble. To provide a means of ingress to the network, we need a larger set of entry points, most of which an adversary won't be able to enumerate easily.

Bridge obfs4 193.11.166.194:27020 86AC7B8D430DAC4117E9F42C9EAED18133863AAF cert=0LDeJH4JzMDtkJJrFphJCiPqKx7loozKN7VNfuukMGfHO0Z8OGdzHVkhVAOfo1mUdv9cMg iat-mode=0

Indirect-access transports

Bridge snowflake X.X.X.X:XXXX 2B280B23E1107BB62ABFC40DDCC8824814F80A72 url=https://snowflake-broker.torproject.net.global.prod.fastly.net/ front=cdn.sstatic.net ice=stun:stun.l.google.com:19302 utls-imitate=hellorandomizedalpn

No secret information: only need one bridge.

But now you run into the Tor scaling problem.

Tor blocking in Russia in December 2021 created a great demand for Snowflake bridges.

How to reduce tor CPU load on a single bridge?

The main Snowflake bridge is starting to become overloaded, because of a recent substantial increase in users. I think the host has sufficient CPU and memory headroom, and pluggable transport process (that receives WebSocket connections and forwards them to tor) is scaling across multiple cores. But the tor process is constantly using 100% of one CPU core, and I suspect that the tor process has become a bottleneck.

In short: run multiple Tor processes on the bridge, ensure they have the same identity keys, and distribute traffic across them using a load balancer.

The traditional way of running a Tor pluggable transports bridge.

Our way, with multiple Tor processes, that permits better scaling.

Getting here required hardware investment as well as the load-balanced multiple-Tor architecture, but the load-balanced multiple-Tor architecture is what made it possible to use the hardware to its fullest.

Most bridges do not need this technique.

It may be useful for other indirect-access transports like Conjure, default obfs4 bridges that serve many users, and maybe even exit nodes.

Hardware eventually becomes a limit anyway—the Tor anti-censorship team has had to establish a second Snowflake bridge.

Snowflake bridge installation guide.

Snowflake Daily Operations fundraising.