FOCI 2023 Paper #38 Reviews and Comments
===========================================================================
* Paper #38 Running a high-performance pluggable transports Tor bridge

Review #38A
===========================================================================

Overall merit
-------------
3. Weak accept

Reviewer expertise
------------------
1. No familiarity

Paper summary
-------------
We identify what bottlenecks arise
as a bridge scales from 500 to 10,000 simultaneous users, and
then from 10,000 to 50,000, and show how to overcome them,
based on our experience running a large bridge. The key idea
is running multiple Tor processes in parallel on the bridge
host, with synchronized identity keys.

Comments for authors
--------------------
P. 1 The authors mention that their user base grew to 100,000 during the experiment. What resources does one need to scale to this number, i.e. is it possible to do this for an average sized organization? What number would be desirable to feel a marked improvement across the Tor network? 


P. 2 parts of the text seem to be missing. "Each Tor process spawns an extor-static-cookie process (in the manner of Figure 1), in order to present a predictable Extended ORPort authentication secret through the load balancer. to spare. For us, this started to be a problem at around 6,000 simultaneous users and 10 MB/s of Tor bandwidth." 

P.3 What are the security risks that come with deploying the authors' solution? The authors seem to allude to it here "It would be easy to patch Tor to use a hardcoded cookie, say, but maintaining a forked version of Tor complicates the deployment of security upgrades, which we deemed unacceptably risky." 
It's still not clear to me if there is a risk to deploying this code. 

Also on p. 3. Just out of curiosity. Would it be difficult for Tor to implement your suggestions? "The hack is effective, but it would be better if there were a supported way to do this in Tor. And what will it take for your solution to be deployed widely? Is there a strategy to disseminate the idea more widely?


Review #38B
===========================================================================

Overall merit
-------------
3. Weak accept

Reviewer expertise
------------------
2. Some familiarity

Paper summary
-------------
This paper describes the bottlenecks involved in vertically scaling Tor bridges and proposes a workaround that enables said bridges to serve an increased amount of simultaneous clients. Concretely, the paper identifies the single-threaded nature of Tor processes as the main factor limiting vertical scalability, and proposes a solution based on the execution of multiple Tor processes that can share the load imposed by a large number of clients in the bridge. As part of this solution, the authors devise a load balancer that distributes the load across multiple Tor processes, while modifying the configuration of said Tor processes to (i) interpose an adapter for enabling Extended ORPort authentication between the pluggable transport and load-balanced Tor processes (ii) not to rotate onion keys. Other host-oriented tweaks aimed at enabling a larger number of concurrent client connections are also discussed.

Comments for authors
--------------------
Thank you for submitting your work to FOCI! This is a good description of a work that, from what I can see, tackles a very practical problem that Tor (large) bridge operators may face. It is important to allow operators to take full advantage of their machines to forward Tor traffic, and the workarounds proposed in this paper enable them to do just that. I appreciate the detail that went into describing each specific bottleneck and each specific configuration that had to be tuned to enable the execution of parallel Tor processes.

The main concern I have with this paper, however, is that it somewhat reads as a report of an implementation effort while its novelty (on what regards the difficulty to concretize the approach altogether) appears to be slim. What can we learn from the paper that wouldn't be true for any other single-threaded application which needs to serve a large number of clients? Of course this work has Tor-specific aspects to deal with, but the described workarounds come across as perhaps too straightforward. 

Besides a walkthrough of the implementation efforts, I wished the paper provided some evaluation to enable the reader to better assess the trade-offs of the proposed deployment. For instance, Section 3.1 reads that "Apart from its cookie juggling, extor-static-cookie does nothing but add overhead to the communications pipeline." Can we quantify this overhead? Would this overhead be perceivable in any way to a typical Tor client? 

In addition, the solution to prevent the multiple Tor instances to drift apart and use different onion keys, i.e., disable onion key rotation, seems brittle. I would have liked to see some discussion on what security issues may arise with the lack of this rotation of onion keys in the long term.


Review #38C
===========================================================================

Overall merit
-------------
2. Weak reject

Reviewer expertise
------------------
3. Knowledgeable

Paper summary
-------------
The paper tries to study the issues pertinent to the potential scaling of Tor bridge traffic protected through PTs for a single client. To that end, the authors enlist the number of challenges in running multiple instances or PTs on a single machine and what are their workarounds. Further, they also enlist various performance bottlenecks.

Comments for authors
--------------------
It is not clear what is the problem the paper is trying to solve. It appears primarily that the authors are trying to explain how to run multiple Tor clients and thus multiple instances of the same PT (or different ones). This is fine, but it seems more like an engineering challenge, so I cannot gauge the research problem the authors plan to tackle. Furthermore, I'll be honest that this is not a research problem per se, but more a description of the engineering challenges of running multiple PT instances (through multiple Tor instances).  

Section 4 mostly seems as if the authors are trying to suggest various system implementation / operational parameters that are required for smooth functioning of the setup with multiple PT instances.


Review #38D
===========================================================================

Overall merit
-------------
4. Accept

Reviewer expertise
------------------
3. Knowledgeable

Paper summary
-------------
In this submission, the authors discuss their experiences with and work-arounds
needed to support a Tor bridge able to service 100k pluggable-transport users.
Though far from the expected scalability approach for pluggable-transport
bridges (i.e. large number of independent bridges with low user-load
per-bridge), pluggable-transports such as meek and Snowflake require that a
single system/instance/node support large numbers of users due to their design
and therefore encounter unexpected bottlenecks due to both architectural design
and configuration of the various components. The biggest bottleneck found by
the authors is the dependence on a single Tor node process to handle all
end-user traffic to which the authors devise a network-level, localhost-only
load-balancer able to distribute user traffic to multiple Tor node processes
transparently through the use of HAProxy, various OS/application configuration
changes, and a custom 'extor-static-cookie' application to handle pairing of
processes and cryptographic material. In addition to this solution, the authors
also describe other bottlenecks encountered subsequently (such as limits on
file descriptors, TCP ports, and firewall tracking implementation) and their
solutions for those as well.

Comments for authors
--------------------
The authors are to be commended on not only their goal of greatly improving the
performance of Tor bridges to enable end-users to circumvent censorship but
also their creativity and engineering-based solutions which remain relatively
straight-forward (as opposed to a clean-slate redesign). This reviewer was
appreciative of this approach as it is significantly easier to deploy in the
real-world due to its node-specific modifications compared to others which
would require major changes to Tor's architecture/design/overall deployment.
This reviewer is, however, unsure of the safety of this approach and its
implications which are not addressed in the submission. Specifically, the
questions this reviewer has are:
- What is the security-impact of not allowing rotation of the medium-term onion
  keys? (S3.2)
	There is certainly a reason for them to be rotated but without addressing,
	any potential danger can not be evaluated by the reader.
- The ORPort's secret cookie appears to be of great security importance but
  what, if any, risk is there of re-mapping via extor-static-cookie? (S3.1)
- What are the dangers of a single-node supporting such high-numbers of users
  and why do the authors feel that this is an acceptable trade-off?
	A fundamental advantage of distributed solutions is to distribute both the
	load and the trust but with ~100k users on a single bridge, this seems to
	create an inticing target for attackers (i.e. compromise a single node to
	reveal/block a small number of users vs. a single node to reveal/block a
	large number of users).
- What are the estimated bounds of scalability with these modifications?
	Though the authors' state capacity of 100k users (S1) is within bounds, is
	the upper-limit now order-101k users or order-1M users? What traffic
	through-put can be supported for user-counts at this scale? Where are the
	next significant bottlenecks expected to be encountered given the authors'
	experience and knowledge with such?
- Though the authors' findings and observations are undoubtedly useful prior to
  the Arti re-implementation being released, which ones do the authors expect
  to still be necissary afterwards? What advice do the authors have for the
  implementers/designers of Arti have given their experiences running a
  high-load pluggable-transport bridge that might aid in avoiding such
  bottlenecks in the future?

In addition to the above questions, this reviewer also has other questions and
comments which are not of great importance but may improve the current
submission and assist in the authors' work towards a full-length conference
submission. The authors may intend to address through an expanded Discussion
section but listed here for completeness:
- In S2, why use listings with explanation text as opposed to table(s)?
- Though readily discernable given the context, are the 'OR' and 'PT'
  abbreviations ever explained prior to their wide usage in S2?
- Why are details such as the user/bandwidth limitations buried in the text
  (i.e. 6k users/10MBs in S3, para 1) and not highlighted in the intro?
- A significant amount of information is packed into S3.0 and paragraph headers
  would be of great use in structuring.
- Why set file descriptors to 1M and not unlimited? (S4, para 2)

NITs:
- S2, para 1 --- 'Refer to Figure 1.' is not a stand-alone sentence
- S3, para 1 --- The line-break at the end of the column is difficult to follow
  due to the phrasing and layout of the next column
- S4 --- 'Further bottlenecks' --> 'Additional Bottlenecks Addressed'
- S4, para 5 --- the 'common advice' needs a citation
- S4, para 7 --- 'to localhost.We use' is missing a space
- S5, para 1 --- 'The multiple-Tor architecture' is unclear in whether
  discussing multiple nodes, processes, etc