Calibrating Tor Metrics user estimates

https://www.bamsoftware.com/talks/pets-2023-metrics/

David Fifield <david@bamsoftware.com>

pets-2023-metrics.zip

Tor directly connecting users, 2023-02-01 to 2023-05-31
https://metrics.torproject.org/userstats-relay-country.html?start=2023-02-01&end=2023-05-31&country=all&events=off

Reproducible Metrics §Users:

Due to the nature of Tor being an anonymity network, we cannot collect identifying data to learn the number of users. That is why we actually don't count users, but we count requests to the directories or bridges that clients clients make periodically to update their list of relays and estimate user numbers indirectly from there.

The result is an average number of concurrent users, estimated from data collected over a day. We can't say how many distinct users there are.

Not really a count of users, but of user-days.

Estimate the number of clients per country and day using the following formula:

r(N) = floor(r(R) / frac / 10)

A client that is connected 24/7 makes about 15 requests per day, but not all clients are connected 24/7, so we picked the number 10 for the average client. We simply divide directory requests by 10 and consider the result as the number of users. Another way of looking at it, is that we assume that each request represents a client that stays online for one tenth of a day, so 2 hours and 24 minutes.

Cf. "Counting daily bridge users" 2012.

https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/5843996162435a20193a2efebce6e1e36ef5192e/src/main/sql/clients/init-userstats.sql#L685

The number 10 is effectively a scaling constant.
Is it right?

Bridges only get connections directly from users, not from other relays.

A pluggable transports server process communicates with tor over a localhost TCP connection.

Idea: count the sockets connected to the tor ExtORPort.

This number should be equal to the number of currently connected users.

ss -n state established inet "dst 127.0.0.1" "dport ExtORPort" | sed 1d | wc -l

Snowflake bridges

Socket counts and Tor Metrics user count estimates
Socket counts vs. Tor Metrics user count estimates

For the snowflake-01 bridge, the scaling constant predicts a user count about 2× actual socket counts. (Constant should be ≈20.)

For the snowflake-02 bridge, the scaling constant predicts a user count about 1.5× actual socket counts. (Constant should be ≈15.)