diff --git a/thesis.tex b/thesis.tex index d1673a0..87366f4 100644 --- a/thesis.tex +++ b/thesis.tex @@ -115,7 +115,6 @@ \makeatletter \newcommand{\gobblecomma}[1]{\@gobble{#1}\ignorespaces} \makeatother -\index{Adobe Flash|see {Flash}} \index{Amazon CloudFront!zzz@\gobblecomma|seealso {meek-amazon}} \index{App Engine|see {Google App Engine}} \index{appspot.com@\nolinkurl{appspot.com}|see {Google App Engine}} @@ -125,8 +124,7 @@ \index{broker|see {Snowflake, broker}} \index{ciphersuite|see {TLS, fingerprinting}} \index{CDN|see {content delivery network}} -\index{CN|see {common name (X.509)}} -\index{CN|see {China}} +\index{CN|see {China; common name (X.509)}} \index{China!zzz@\gobblecomma|seealso {Great Firewall of China}} \index{CloudFront|see {Amazon CloudFront}} \index{decoy routing|see {refraction networking}} @@ -143,14 +141,19 @@ \index{Google App Engine!zzz@\gobblecomma|seealso {meek-google}} \index{Hypertext Transfer Protocol|see {HTTP}} \index{injection|see {packet injection}} +\index{ISP|see {Internet service provider}} \index{Microsoft Azure!zzz@\gobblecomma|seealso {meek-azure}} +\index{MITM|see {man in the middle}} \index{NDSS|see {Network and Distributed System Security Symposium}} +\index{NIDS|see {intrusion detection}} +\index{network intrusion detection system|see {intrusion detection}} \index{OpenSSH|see {obfuscated-openssh}} +\index{overblocking|see {false positive}} \index{PETS|see {Privacy Enhancing Technologies Symposium}} \index{pluggable transports!zzz@\gobblecomma|seealso {flash proxy; FTE; meek; obfs2; obfs3; obfs4; ScrambleSuit; Snowflake}} -\index{Portable Document Format|see {PDF}} \index{port scanning!zzz@\gobblecomma|seealso {active probing}} \index{precision|see {false positives}} +\index{proxy discovery problem|see {proxy distribution problem}} \index{recall|see {false negatives}} \index{SNI|see {Server Name Indication}} \index{Secure Sockets Layer|see {TLS}} @@ -169,6 +172,7 @@ \index{Secure Shell|see {SSH}} \index{Uniform Resource Locator|see {URL}} \index{U.S.|see {United States of America}} +\index{World Wide Web!zzz@\gobblecomma|seealso {HTTP; HTTPS}} \index{Tor bridge!zzz@\gobblecomma|seealso {Azadi, cymrubridge31, cymrubridge33, fdctorbridge01, GreenBelt, JonbesheSabz, LeifEricson, Lisbeth, MaBishomarim, Mosaddegh, ndnop3, ndnop4, ndnop5, noether, NX01, riemann}} @@ -265,8 +269,8 @@ wants to reach a destination on the outside. \index{border firewall} \end{figure} -A~\emph{client} resides within a network -that is entirely controlled by a \emph{censor}. +A~\emph{client}\index{client|textbf} resides within a network +that is entirely controlled by a \emph{censor}\index{censor|textbf}. Within the controlled network, the censor may observe, modify, inject, or block any communication along any link. @@ -279,7 +283,7 @@ that use certain protocols. The client's goal is to evade the censor's controls and communicate with some -\emph{destination} that lies outside the censor's network; +\emph{destination}\index{destination|textbf} that lies outside the censor's network; successfully doing so is called \emph{circumvention}\index{circumvention|textbf}. Circumvention means somehow safely traversing a hostile network, @@ -288,7 +292,7 @@ The censor does not control the network outside its border; it may send messages to the outside world, but it cannot control them after they have traversed the border. -This abstract model is a good starting point, +This abstract model\index{models} is a good starting point, but it is not the whole story. We will have to adapt it to fit different situations, sometimes relaxing and sometimes strengthening assumptions. @@ -306,15 +310,15 @@ The destination may knowingly cooperate with the client's circumvention effort, or may not. There are many possible complications, reflecting the messiness and diversity of dealing with real censors. -Adjusting the basic model to reflect real-world actors' -motivations and capabilities is the heart of \emph{threat modeling}. +Adjusting the basic model\index{models} to reflect real-world actors' +motivations and capabilities is the heart of \emph{threat modeling}\index{threat modeling}. In particular, what makes circumvention possible at all is the censor's motivation to block only some, but not all, of the incoming and outgoing communications---this assumption will be a major focus of the next chapter. -It is not hard to see how the border firewall model +It is not hard to see how the border firewall model\index{models} relates to censorship in practice. In a common case, the censor is the government of a country, and the limits of its controlled network correspond to @@ -329,7 +333,7 @@ Content restrictions may vary across geographic locations, even within the same country---Wright et~al.~\indexauthors{\cite{Wright2011a}} identified some reasons why this might be. -A~good model for some places is not a single unified regime, +A~good model\index{models} for some places is not a single unified regime, but rather several autonomous service providers, each controlling and censoring its own portion of the network, perhaps coordinating with others about what to block and perhaps not. @@ -371,7 +375,7 @@ Some other censorship-related topics that are \emph{not} in scope include: insofar as access to such services may be blocked \end{itemize} -Parts of the abstract model are deliberately +Parts of the abstract model\index{models} are deliberately left unspecified, to allow for the many variations that arise in practice. The precise nature of ``blocking'' can take many forms, from packet dropping\index{packet dropping}, to injection of false responses\index{packet injection}, @@ -410,7 +414,7 @@ is the thesis of the thesis, in which I~lay out opinionated general principles of the field of circumvention. The remaining chapters are split between -the topics of modeling and circumvention. +the topics of modeling\index{models} and circumvention. One's point of view is colored by experience. I~will therefore briefly describe the background to my research. @@ -439,17 +443,12 @@ to circumvent regularly, as a user. \chapter{Principles of circumvention} \label{chap:principles} -\begin{itemize} -\item Pluggable transports -% ss now has its own plugin system -\end{itemize} - \index{detection!versus blocking} \index{blocking!versus detection} In order to understand the challenges of circumvention, it helps to put yourself in the mindset of a censor. A censor has two high-level functions: detection and blocking. -Detection is a classification problem: +Detection is a classification\index{classification} problem: the censor prefers to permit some communications and deny others, and so it must have some procedure for deciding which communications fall in which category. @@ -457,64 +456,69 @@ Blocking follows detection. Once the censor detects some prohibited communication, it must take some action to stop the communication, such as terminating the connection at a network router. -A censor must be able both to detect and to block. +Censorship requires both detection and blocking. (Detection without blocking would be called surveillance, not censorship.) The flip side of this statement is that -a circumventor succeeds either by -eluding detection, or, once detected, -somehow resist the censor's blocking action. +circumvention has two ways to succeed: +by eluding detection, or, once detected, +by somehow resisting the censor's blocking action. A censor is, then, essentially -a traffic classifier coupled with -a blocking mechanism. +a traffic classifier\index{classification} coupled with +a blocking mechanism\index{blocking}. Though the design space is large, and many complications are possible, -at its heart it must decide, for each communication, +at its heart a censor must decide, for each communication, whether to block or allow, and then effect blocks as appropriate. Like any classifier, a censor is liable to make mistakes. When the censor fails to block something that it would have preferred to block, -it is an error called a \emph{false negative}; +it is an error called a \emph{false negative}\index{false negative|textbf}; when the censor accidentally blocks something that it would have preferred to allow, -it is a \emph{false positive}. -Techniques for avoiding detection are often called -network protocol ``obfuscation,'' -and the term is apt. -It reflects not an attitude of security through obscurity; -but rather a recognition that avoiding detection is about +it is a \emph{false positive}\index{false positive|textbf}. +Techniques to avoiding detection are often called +``obfuscation,''\index{obfuscation} +and the term is an appropriate one. +It reflects not an attitude of security through obscurity\index{security through obscurity}; +but rather a recognition that avoiding detection\index{detection} is about making the censor's classification problem more difficult, and therefore more costly. -Forcing the censor to trade false positives for false negatives +Forcing the censor to trade false positives\index{false positives} +for false negatives\index{false negatives} is the core of all circumvention that is based on avoiding detection. -The costs of misclassifications cannot be understood +The costs\index{costs} of misclassifications cannot be understood in absolute terms: -they only have meaning relative to a given censor -and its specific resources and motivations. -Understanding the relative importance the censor +they only have meaning relative to a specific censor +and its resources and motivations. +Understanding the relative importance that a censor assigns to classification errors---knowing -what it prefers to allow and to block---is helpful. -Through good modeling, +what it prefers to allow and to block---is key to knowing +what what kind of circumvention will be successful. +Through good modeling\index{models}, we can make the tradeoffs less favorable for the censor and more favorable for the circumventor. -The censor may base its detection decision +The censor may base its classification\index{classification} decision on whatever criteria it finds practical. I like to divide detection techniques into two classes: -\emph{detection by content} and \emph{detection by address}. -Detection by content is based on the content or topic +\emph{detection by content}\index{detection!by content} +and \emph{detection by address}\index{detection!by address}. +Detection by content is based on the content or topic\index{content|textbf} of the message: -keyword filtering and protocol identification fall into this class. +keyword\index{keywords} filtering and protocol identification fall into this class. Detection by address is based on the sender or recipient of the message: -IP address blacklists and DNS response tampering\index{DNS!poisoning} fall into this class. -An ``address'' may be any kind of identifier: -an IP address, a domain name, an email address. +IP address blacklists\index{blacklist} +and DNS response tampering\index{DNS!poisoning} fall into this class. +An ``address''\index{address|textbf} may be any kind of identifier: +an IP address, a domain name, an email\index{email} address. Of these two classes, my experience is that detection by address is harder to defeat. -Of course, there is no clear separation between -what is content and what is an address. -The layered nature of network protocols means that -an address at one layer is content at another. +The distinction is not perfectly clear because +there is no clear separation between +what is content and what is an address: +the layered nature of network protocols means that +one layer's address is another layer's content. Nevertheless, I find it useful to think about detection techniques in these terms. @@ -522,37 +526,35 @@ The censor may block the address of the destination, preventing direct access. Any communication between the client and the destination must therefore be indirect. -The intermediary between client and destination +The indirect link between client and destination is called a \emph{proxy}\index{proxy|textbf}, and it must do two things: provide an unblocked address for the client to contact; and somehow mask the contents of the channel and the eventual destination address. -Throughout this thesis, I will use the word ``proxy'' -with an abstract meaning of ``one that acts of behalf of another.'' -A~proxy need not be what is typically understood by the term ``proxy server,'' -a single host accepting and forwarding connections. +I~will use the word ``proxy'' expansively +to encompass any kind of intermediary, +not only a single host implementing a proxy protocol +such an HTTP proxy\index{HTTP!proxy} or SOCKS\index{SOCKS} proxy. A~VPN\index{VPN} (virtual private network) is also a kind of proxy, as is the Tor network, as may be a specially configured network router. -In \autoref{chap:domain-fronting} we will see -a network of cloud servers acting as a proxy. -In \autoref{chap:snowflake} the proxy will -be a pool of temporary instances of some JavaScript code. +A~proxy is anything that acts on a client's behalf +to assist in circumvention. Proxies solve the first-order effects of censorship -(detection by content and address), +(detection by content and address),\index{detection!by content}\index{detection!by address} but they induce a second-order effect: the censor must now seek out and block proxies, in addition to the contents and addresses that are its primary targets. This is where circumvention research really begins: not with access to the destination per~se, -but access to a proxy, which transitively gives +but with access to a proxy, which transitively gives access to the destination. -The censor attempts deals with detecting and blocking communication with proxies +The censor attempts to deal with detecting and blocking communication with proxies using the same tools it would for any other communication. -Just as it may look for forbidden keywords in text, +Just as it may look for forbidden keywords\index{keywords} in text, it may look for distinctive features of proxy protocols; just as it may block politically sensitive web sites, it may block the addresses of any proxies it can discover. @@ -561,64 +563,62 @@ is to use proxy addresses and proxy protocols that are difficult for the censor to detect or block. The way of organizing censorship and circumvention techniques -that I have presented is not the only way. -Köpsell and Hillig divide detection into -``content'' and ``circumstances''~\cite[\S 4]{Koepsell2004a}; -their circumstances include addresses and also what I would consider more content-like: -timing, data transfer characteristics, and protocols. -Philipp Winter divides circumvention into three problems: -bootstrapping, endpoint blocking, and traffic obfuscation~\cite[\S 1.1]{Winter2014c}. +that I have presented is not the only one. +Köpsell and Hillig~\indexauthors{\cite[\S 4]{Koepsell2004a}} +divide detection into +``content'' and ``circumstances''; +their ``circumstances'' include addresses and also features that +I~consider more content-like: +timing\index{packet size and timing}, data transfer characteristics, and protocols. +Winter~\indexauthors{\cite[\S 1.1]{Winter2014c}} +divides circumvention into three problems: +bootstrapping, endpoint blocking, and traffic obfuscation\index{obfuscation}. Endpoint blocking and traffic obfuscation correspond to my detection by address and detection by content; bootstrapping is the challenge of getting a copy of circumvention software and discovering initial proxy addresses. I tend to fold bootstrapping in with address-based detection; see \autoref{sec:address-strategies}. -Khattak, Elahi, et~al., -in their 2016 survey and systematization of circumvention systems, -break detection into four aspects: +Khattak, Elahi, et~al.\ break detection into four aspects~\indexauthors{\cite[\S 2.4]{Khattak2016a}}: destinations, content, flow properties, -and protocol semantics~\cite[\S 2.4]{Khattak2016a}. +and protocol semantics. I think of their ``content,'' ``flow properties,'' and ``protocol semantics'' as all fitting under the heading of content. -Tschantz et~al.\ identify ``setup'' and ``usage''~\cite[\S V]{Tschantz2016a-local}, -and Khattak, Elahi, et~al.\ identify -``communication establishment'' and ``conversation''~\cite[\S 3.1]{Khattak2016a}, -as targets of obfuscation; -these mostly correspond to address and content. +My split between address and content mostly corresponds to +Tschantz et~al.'s ``setup'' and ``usage''~\indexauthors{\cite[\S V]{Tschantz2016a-local}} +and Khattak, Elahi, et~al.'s +``communication establishment'' and ``conversation''~\indexauthors{\cite[\S 3.1]{Khattak2016a}}. What I call ``detection'' and ``blocking,'' -Khattak, Elahi, et~al.\ call ``fingerprinting'' and ``direct censorship''~\cite[\S 2.3]{Khattak2016a}, -and Tschantz et~al.\ call ``detection'' and ``action''~\cite[\S II]{Tschantz2016a-local}. +Khattak, Elahi, et~al.\ call ``fingerprinting'' and ``direct censorship''~\indexauthors{\cite[\S 2.3]{Khattak2016a}}, +and Tschantz et~al.\ call ``detection'' and ``action''~\indexauthors{\cite[\S II]{Tschantz2016a-local}}. A major difficulty in developing circumvention systems is that -however much you model and try to predict the reactions of a censor, -real-world stress testing is expensive. +however much you model\index{models} +and try to predict the reactions of a censor, +real-world testing is expensive. If you really want to test a design against a censor, not only must you write and deploy an implementation, integrate it with client-facing software like web browser, -and work out details of distribution---you +and work out details of its distribution---you must also attract enough users to merit a censor's attention. Any system, even a fundamentally broken one, will work to circumvent most censors, -as long as it is used only by one or a few clients. +as long as it is used only by one or only a few clients. The true test arises only after the system has begun to scale and the censor to fight back. This phenomenon may have contributed to the unfortunate -characterization of censorship and circumvention as a cat-and-mouse game: -deploying a weak circumvention system, -watching it get blocked as it becomes popular, -and starting over again with another similarly weak system. -In my opinion, the cat-and-mouse game is not inevitable. +characterization of censorship and circumvention as a cat-and-mouse game\index{cat-and-mouse game}: +deploying a flawed circumvention system, +watching it become more popular and then get blocked, +then starting over again with another similarly flawed system. +In my opinion, the cat-and-mouse game is not inevitable, +but is a consequence of inadequate understanding of censors. It is possible to develop systems that resist blocking---not -absolutely, but quantifiably in terms of costs to the blocker---even -after it has become popular. -We should think of the honeymoon period -while a system is too small to be worth noticing, -not as the beginning and end of a system's useful life, -but as a time to work out growing pains. +absolutely, but quantifiably, in terms of costs to the censor---even +after they have become popular. \section{Collateral damage} @@ -626,20 +626,20 @@ but as a time to work out growing pains. \index{collateral damage|(textbf} -What's to prevent the censor from -shutting down all connectivity within its network, -trivially preventing the client from reaching the destination? -The answer is that the censor derives some kind of benefit +What prevents the censor from +shutting down all connectivity\index{shutdown} within its network, +trivially preventing the client from reaching any destination? +The answer is that the censor derives benefits from allowing network connectivity, -other than that which it tries to censor. +other than the communications which it wants to censor. Or to put it another way: -the censor \emph{incurs a cost} -whenever it commits a false positive -(also called overblocking: inadvertently blocking something it would -have preferred to allow). +the censor incurs a cost\index{costs} +when it overblocks\index{false positive}: +accidentally blocks something it would +have preferred to allow. Because it wants to block some things and allow others, -the censor is forced to run as a classifier. +the censor is forced to run as a classifier\index{classification}. In order to avoid harm to itself, the censor permits some measure of circumvention traffic. @@ -649,36 +649,37 @@ The term is a bit unfortunate, evoking as it does negative connotations from other contexts. It helps to focus more on the ``collateral'' than the ``damage'': collateral damage is any cost -\emph{experienced by the censor} +experienced \emph{by the censor} as a result of incidental blocking done in the course of censorship. It must trade its desire to block forbidden communications against its desire to avoid harm to itself, -balance underblocking with overblocking. +balance underblocking\index{false negative} with overblocking\index{false positive}. Ideally, we force the censor into a dilemma: -unable to distinguish between circumvention and other traffic, +unable to distinguish\index{distinguishability} between circumvention and other traffic, it must choose either to allow circumvention along with everything else, or else block everything and suffer maximum collateral damage. -It is not necessary to fully reach this ideal before +It is not necessary to reach this ideal fully before circumvention becomes possible. Better obfuscation drives up the censor's error rate -and therefore the cost of any blocking. +and therefore the cost\index{costs} of any blocking. Ideally, the potential ``damage'' is never realized, -because the censor sees the cost as being too great. +because the censor sees the cost\index{costs} as being too great. -Collateral damage, being an abstract ``cost,'' can take many forms. +Collateral damage, being an abstract ``cost,''\index{costs} can take many forms. It may come in the form of civil discontent, as people try to access web sites and get annoyed with the government when unable to do so. It may be reduced productivity, as workers are unable to access resources they need to to their job. -This is the usual explanation for why the -Great Firewall of China has never blocked GitHub\index{GitHub} -for long\todo{when and how long?}, -despite GitHub's hosting and distribution -of circumvention software: +This is the usual explanation offered for why the +Great Firewall of China\index{Great Firewall of China} +has never blocked GitHub\index{GitHub} +for for more than a few days, +despite GitHub's being used to host and distribute +circumvention software: GitHub is so deeply integrated into software development, -that programmers are not able to work when it is blocked. +that programmers cannot get work done when it is blocked. Collateral damage, as with other aspects of censorship, cannot be understood in isolation, @@ -698,17 +699,21 @@ If the answers to these question is yes, then yes, the collateral damage is likely to be high. But if not, then the censor could take or leave those hundred sites---it doesn't matter. +Collateral damage is not just any harm that results +from censorship, +it is harm that is felt by the censor. Censors may take actions to reduce collateral damage while still blocking most of what they intend to. (Another way to think of it is: -reducing false positives without reducing false negatives.) +reducing false positives\index{false positive} +without increasing false negatives\index{false negative}.) For example, it has been repeatedly documented---by -Clayton et~al.~\cite{Clayton2006a}, -Winter and Lindskog~\cite{Winter2012a}, -and Fifield, Tsai, and Zhong~\autoref{chap:proxy-probe}, -for example---that the Great Firewall -prefers to block individual ports (or a small range of ports), +Clayton et~al.~\indexauthors{\cite{Clayton2006a}}, +Winter and Lindskog~\indexauthors{\cite{Winter2012a}}, +and Fifield, Tsai, and Zhong~(\autoref{chap:proxy-probe}), +for example---that the Great Firewall\index{Great Firewall of China} +prefers to block individual ports rather than blocking an entire IP address, probably in a bid to reduce collateral damage. In \autoref{chap:domain-fronting} we will see a system @@ -716,7 +721,7 @@ whose blocking resistance is based on widely used web services---the argument is that to block the circumvention system, the censor would have to block the entire web service. However this argument requires that the circumvention system's -use of the web service be indistinguishable from other uses---otherwise +use of the web service be indistinguishable\index{distinguishability} from other uses---otherwise the censor may selectively block only the connections used for circumvention. Local circumstances may serve to reduce collateral damage: for example if a domestic replacement exists @@ -727,98 +732,103 @@ The censor's reluctance to cause collateral damage is what makes circumvention possible in general. (There are some exceptions, discussed in the next section, -where the censor can detect but is not capable of blocking.) -To deploying a circumvention system is to make a bet: -that the censor cannot field a classifier -that adequately distinguishes traffic of the circumvention system +where the censor can detect but for some reason cannot block.) +To deploy a circumvention system is to make a bet: +that the censor cannot field a classifier\index{classification} +that adequately distinguishes\index{distinguishability} +the traffic of the circumvention system from other traffic which, if blocked, would result in collateral damage. -Even steganographic circumvention channels that mimic some other protocol -ultimately derive their blocking resistance from a collateral damage argument: -that the censor feels that to block that other protocol -would result in too much damage to be worth it. -For example, a circumvention protocol that imitates HTTP +Even steganographic\index{steganography} circumvention channels that mimic some other protocol +ultimately derive their blocking resistance from the potential of collateral damage. +For example, a protocol that imitates HTTP\index{HTTP} can be blocked by blocking HTTP---the question then is whether the censor can afford to block HTTP. -And that's in the best case---assuming the circumvention protocol has no ``tell'' -that enables the censor easily to distinguish it from the cover protocol +And that's in the best case, assuming that the circumvention protocol has no ``tell'' +that enables the censor to distinguish\index{distinguishability} it from the cover protocol it is trying to imitate. -Indistinguishability is a necessary but not sufficient condition +Indistinguishability\index{distinguishability} is a necessary but not sufficient condition for blocking resistance: that which you are trying to be indistinguishable from must also have sufficient collateral damage. -It's of no use to have a perfect steganographic of a protocol +It's no use to have a perfect steganographic imitation +of a protocol that the censor doesn't mind blocking. In my opinion, collateral damage provides a more productive way to think about the behavior of censors than do alternatives. -It is able to take into account different censors' +It takes into account different censors' differing resources and motivations, -and so is more useful for generic modeling. +and so is more useful for generic modeling\index{modeling}. Moreover, it gets to the heart of what makes traffic resistant to blocking. -There have been many other attempts at -defining resistance to blocking. -Narain et~al.~\cite{Narain2014a} -called the essential element ``deniability,'' -meaning that a user could plausibly claim to have been doing +There are other ways of +characterizing censorship resistance. +Many authors---Burnett et~al.~\indexauthors{\cite{Burnett2010a}}, +and Jones et~al.~\indexauthors{Jones2014a}, +for instance---call +the essential element ``deniability\index{deniability},'' +meaning that a client can plausibly claim to have been doing something other than circumventing when confronted with a log of their network activity. -Khattak, Elahi, et~al.~\cite[\S 4]{Khattak2016a} also consider -``deniability'' separately from ``unblockability.'' +Khattak, Elahi, et~al.~\cite[\S 4]{Khattak2016a} consider +``deniability''\index{deniability} separately from ``unblockability\index{unblockability}.'' +% \cite{Narain2014a} also says ``deniability'' % \cite{Houmansadr2011a} also says ``deniability'' -% \cite{Burnett2010a} also says ``deniability'' -% \cite{Jones2014a} also says ``deniability'' Houmansadr et~al.~\cite{Houmansadr2011a,Houmansadr2013a,Houmansadr2013b} -used the term ``unobservability,'' -which I feel fails to convey that the censor's -essential function is distinguishing, not observation. -Brubaker et~al.~\cite{Brubaker2014a} -used the term ``entanglement,'' -which is closer to the mark -and inspired my own thinking. +used the term ``unobservability,''\index{unobservability} +which I feel fails to capture the censor's +essential function of distinguishing\index{distinguishability}, not only observation. +Brubaker et~al.~\indexauthors{\cite{Brubaker2014a}} +used the term ``entanglement\index{entanglement},'' +which I~found enlightening. What they call entanglement I think of as -indistinguishability, -and keep in mind that that which you are trying to be indistinguishable with -has to be something valued by the censor. +indistinguishability\index{distinguishability}---keeping +in mind that that which you are trying to be indistinguishable from +must be valued by the censor. Collateral damage provides a way to make statements about censorship resistance quantifiable, at least in a loose sense. -Rather than saying, ``the censor cannot block $X$,'' +Rather than saying, ``the censor cannot block $X$,''\index{unblockability} or even, ``the censor is unwilling to block $X$,'' -it is better to say ``in order to block $X$, the censor would have to do $Y$,'' +it is better to say ``in order to block $X$, the censor would have to do $Y$,''\index{costs} where $Y$ is some action bearing a cost for the censor. A statement like this makes it clear that some censors may be able to afford -the cost of doing $Y$ and others may not; -there is no ``unblockable'' in absolute terms. +the cost of blocking and others may not; +there is no ``unblockability\index{unblockability}'' in absolute terms. Now, actually quantifying the value of $Y$ is a task in itself, by no means a trivial one. -The state of research in this field is still far from being able to assign -actual numbers (e.g. in terms of dollars) to costs as perceived by censors. +A~challenge for future work in this field is to assign +actual numbers (e.g., in dollars) to the costs borne by censors. If a circumvention system becomes blocked, it may simply mean that the circumventor overestimated the collateral damage or underestimated the censor's capacity to absorb it. +\index{shutdown|(} We have observed that the risk of collateral damage is what prevents the censor from shutting down the network completely---and -yet, censors \emph{do} occasionally do complete shutdowns. -In fact the practice is increasing; -\todo{someone} reported \todo{some number} -of shutdowns in 2016. +yet, censors \emph{do} occasionally enact shutdowns or daily ``curfews.'' +Shutdowns are costly---West~\indexauthors{\cite{West2016a}} +looked at 81~shutdowns in 19~countries +in 2015 and 2016, and estimated that they collectively cost +\$2.4 billion in losses to gross domestic product. +Deloitte~\indexauthors{\cite{DeloitteShutdowns}} +estimated that shutdowns cost millions of dollars per day +per 10~million population, +the amount depending on a country's level of connectivity. This does not necessarily contradict the theory of collateral damage. -Shutdowns are indeed costly---\todo{someone} -estimated that shutdowns cost \todo{some amount}. It is just that, in some cases, -the calculus works out that the harm caused by a shutdown -does not outweigh (in the censor's mind) -the benefits of blocking access. +a censor reckons that +the benefits of a shutdown +outweigh the costs. As always, the outcome depends on the specific censor: censors that don't benefit as much from the Internet don't have as much to lose by blocking it. -The fact that shutdowns or ``curfews'' are limited in duration -shows that even censors that can afford to do a total shutdown cannot -afford to do it forever. +The fact that shutdowns are limited in duration +shows that even censors that can afford to a shutdown cannot +afford to keep it up forever. +\index{shutdown|)} Complicating everything is the fact that censors are not bound to act rationally. @@ -827,8 +837,10 @@ complex entity, a censor is prone to err, to act impetuously, to make decisions that cause more harm than good. -One might even say that the very decision to censor -is exactly such an irrational decision, at the greater societal level. +The imposition of censorship in the first place, +I~suggest, +is exactly such an irrational action, +retarding progress at the greater societal level\index{costs!of censorship}. \index{collateral damage|)} @@ -838,32 +850,26 @@ is exactly such an irrational decision, at the greater societal level. \index{detection!by content|(} -\begin{itemize} -\item Sony thing on passive/active detection \cite[\S 5.1]{SladekBroseEANTC} -\item relation to website fingerprinting---circumvention is potentially harder because you can't just use e.g. constant bitrate -\end{itemize} - -There are two general strategies to counter content-based blocking. +There are two general strategies to counter content-based detection. The first is to mimic some content that the censor allows, like HTTP\index{HTTP} or email\index{email}. The second is to randomize the content, -to make it dissimilar to anything that the censor specifically blocks. +making it dissimilar to anything that the censor specifically blocks. -Tschantz et~al.~\cite{Tschantz2016a-local} call these two strategies +Tschantz et~al.~\indexauthors{\cite{Tschantz2016a-local}} call these two strategies ``steganography''\index{steganography} and ``polymorphism''\index{polymorphism} respectively. -Another way to say it is ``look like something'' -and ``look like nothing.''\index{look-like-nothing transport} -They are not strict classifications---any -real system will incorporate a bit of both---and +It is not a strict classification---any +real system will incorporate a bit of both. +The two strategies reflect they reflect differing conceptions of censors. Steganography works against -a ``whitelisting'' or ``default-deny'' censor, +a ``whitelisting\index{whitelist}'' or ``default-deny'' censor, one that permits only a set of specifically enumerated protocols and blocks all others. Polymorphism, on the other hand, -falls to a whitelisting censor, -but works against a ``blacklisting'' or ``default-allow'' censor, +fails against a whitelisting censor, +but works against a ``blacklisting\index{blacklist}'' or ``default-allow'' censor, one that blocks a set of specifically enumerated protocols and allows all others. @@ -871,170 +877,164 @@ and allows all others. This is not to say that steganography is strictly superior to polymorphism---there are tradeoffs in both directions. Effective mimicry can be difficult to achieve, -and in any case effectiveness can only be judged -against a censor's specific computations of collateral damage. -Whitelisting, by its nature, -tends to cause more collateral damage than blacklisting. -And just as obfuscation protocols are not purely steganographic -or polymorphic, +and in any case its effectiveness can only be judged +against a censor's sensitivity to collateral damage\index{collateral damage}. +Whitelisting\index{whitelist}, by its nature, +tends to cause more collateral damage than blacklisting\index{blacklist}. +And just as obfuscation\index{obfuscation} protocols are +not purely steganographic or polymorphic, real censors are not purely whitelisting or blacklisting. -Houmansadr et~al.~\cite{Houmansadr2013b} +Houmansadr et~al.~\indexauthors{\cite{Houmansadr2013b}} exhibited weaknesses in ``parrot''\index{dead-parrot attacks} circumvention systems -that mimic a cover protocol but do not perfectly imitate it. +that imperfectly mimic a cover protocol. Mimicking a protocol in every detail, down to its error behavior, is difficult, and any inconsistency is a potential feature that a censor may exploit. -Wang et~al.~\cite{Wang2015a} found that some of -Houmansadr et~al.'s proposed attacks were impractical, -due to high false-positive rates, -but proposed other attacks designed for efficiency -and low false positives, -against both steganographic and polymorphic protocols. -Geddes et~al.~\cite{Geddes2013a} showed that even perfect imitation -(achieved via tunneling) may leave vulnerabilities +Wang et~al.~\indexauthors{\cite{Wang2015a}} found that some of +the proposed attacks against parrot systems would be impractical +due to high false-positive rates\index{false positive}, +but offered other attacks designed for efficiency +and low false positives\index{false positive}. +Geddes et~al.~\indexauthors{\cite{Geddes2013a}} showed that even perfect imitation +may leave vulnerabilities due to mismatches between the cover protocol and -the covert protocol---for instance randomly dropping packets -may disrupt circumvention more than other uses of the cover protocol. -It's worth noting, though, that apart from active probing and +the carried protocol. +For instance, randomly dropping packets +may disrupt circumvention more than normal use of the cover protocol. +It's worth noting, though, that apart from active probing\index{active probing} and perhaps entropy measurement, most of the attacks proposed -in academic literature have not been used by censors in practice. +in academic research have not been used by censors in practice. Some systematizations -(for example those of Brubaker et~al.~\cite[\S 6]{Brubaker2014a}; -Wang et~al.~\cite[\S 2]{Wang2015a}; and -Khattak, Elahi, et~al.~\cite[\S 6.1]{Khattak2016a}) +(for example those of Brubaker et~al.~\indexauthors{\cite[\S 6]{Brubaker2014a}}; +Wang et~al.~\indexauthors{\cite[\S 2]{Wang2015a}}; and +Khattak, Elahi, et~al.~\indexauthors{\cite[\S 6.1]{Khattak2016a}}) further subdivide steganographic systems into those based on mimicry (attempting to replicate the behavior of a cover protocol) -and tunneling +and tunneling\index{tunneling} (sending through a genuine implementation of the cover protocol). -I do not find the distinction useful, -except when speaking of concrete implementation choices; -to me, there are various degrees of fidelity in imitation, +I~do not find the distinction very useful, +except when discussing concrete implementation choices. +To me, there is no clear division: +there are various degrees of fidelity in imitation, and tunneling only tends to offer higher fidelity -than mimicry. - -I will list some representative circumvention systems -that exemplify the steganographic strategy. -Infranet~\cite{Feamster2002a}\index{Infranet}, way back in 2002, -built a covert channel out of HTTP\index{HTTP}, -encoding upstream data in special requests -and downstream data using standard steganography in image files. -(An aside on the evolution of threat models: -the authors of Infranet rejected the possibility of using TLS\index{TLS}, -because TLS was not then common enough that its wholesale blocking -would cause much damage. -Today the situation surrounding TLS is much different, -and it is much relied on by circumventors.) +than does mimicry. + +I~will list some circumvention systems +that represent the steganographic strategy. +Infranet~\indexauthors{\cite{Feamster2002a}}\index{Infranet}, way back in 2002, +built a covert channel within HTTP\index{HTTP}, +encoding upstream data as crafted requests +and downstream data as steganographic images. % Collage~\cite{Burnett2010a}\index{Collage} % Facade~\cite{Jones2014a}\index{Facade} (2014) updates Infranet. -StegoTorus~\cite{Weinberg2012a} (2012) uses custom encoders +StegoTorus~\indexauthors{\cite{Weinberg2012a}}\index{StegoTorus} (2012) uses custom encoders to make traffic resemble common HTTP file types, -such as PDF\index{PDF}, JavaScript\index{JavaScript}, and Flash\index{Flash}. -SkypeMorph~\cite{Moghaddam2012a}\index{SkypeMorph} (2012) mimics a Skype\index{Skype} video call. -FreeWave~\cite{Houmansadr2013a}\index{FreeWave} (2013) modulates a data stream -into an acoustic signal and transmits it over VoIP\index{voice over IP}. -FTE~\cite{Dyer2013a}\index{FTE} (for ``format-transforming encryption''; 2013) -and its followup Marionette~\cite{Dyer2015a}\index{Marionette} (2015) +such as PDF, JavaScript\index{JavaScript}, and Flash. +SkypeMorph~\indexauthors{\cite{Moghaddam2012a}}\index{SkypeMorph} (2012) mimics a Skype\index{Skype} video call. +FreeWave~\indexauthors{\cite{Houmansadr2013a}}\index{FreeWave} (2013) modulates a data stream +into an acoustic signal and transmits it over VoIP\index{VoIP}. +Format-transforming encryption, +or FTE~\indexauthors{\cite{Dyer2013a}}\index{FTE} (2013), force traffic to conform to a user-specified syntax: if you can describe it, you can imitate it. -Despite the research attention they have received, -steganographic systems have not been as used in practice: -of these listed systems, FTE is the only one that +Despite receiving much research attention, +steganographic systems have not been as used in practice +as polymorphic ones. +Of the listed systems, only FTE has seen substantial deployment. \index{steganography|)} \index{polymorphism|(} -There are many examples of the randomized, polymorphic strategy. -An important subclass of these are the so-called -look-like-nothing systems\index{look-like-nothing transport} that encrypt a stream +There are many examples of the randomized\index{randomization+}, polymorphic strategy. +An important subclass of these comprises the so-called +look-like-nothing systems\index{look-like-nothing transport} that encrypt\index{encryption+} a stream without any plaintext header or framing information, -so that it appears to be a uniformly random byte sequence. +so that it appears to be a uniformly random\index{randomness+} byte sequence. A pioneering design was the obfuscated-openssh\index{obfuscated-openssh} -of Bruce Leidl~\cite{Leidl-obfuscated-openssh}, +of Bruce Leidl~\indexauthors{\cite{Leidl-obfuscated-openssh}}, which aimed to hide the plaintext packet metadata in the SSH protocol\index{SSH}. obfuscated-openssh worked, in essence, -by first sending a cryptographic key, +by first sending an encryption key, and then sending ciphertext encrypted with that key. -The encryption of the obfuscation layer was an additional, independent layer -on top of SSH's usual encryption. -A censor could, in principle, purely passively detect and deobfuscate -the protocol just by recovering the key and using it to decrypt the rest---a -situation partially mitigated by the use of an expensive key derivation function -based on iterated hashing. -obfuscated-openssh could optionally incorporate a pre-shared password -into the key derivation function, which would prevent easy identification. -Dust\index{Dust}~\cite{Wiley2011a}, a design by Brandon Wiley, -similarly randomized bytes +The encryption of the obfuscation layer was an additional layer, +independent of SSH's\index{SSH} ordinary encryption. +A censor could, in principle, passively detect and deobfuscate +the protocol by recovering the key and using it to decrypt the rest of the stream. +obfuscated-openssh could optionally incorporate a pre-shared password\index{authentication} +into the key derivation function, which would protect against this attack. +Dust\index{Dust}~\cite{Wiley2011a}, +similarly randomized\index{randomness} bytes (at least in its v1 version---later versions permitted fitting to distributions other than uniform). -It was not susceptible to passive deobfuscation, -relying on an out-of-band key exchange before each session. +It was not susceptible to passive deobfuscation +because it required an out-of-band key exchange +to happen before each session. Shadowsocks~\cite{Shadowsocks}\index{Shadowsocks} is a lightweight encryption layer atop a simple proxy protocol. -There is a line of successive look-like-nothing protocols---known by the names -obfs2, obfs3, ScrambleSuit, and obfs4---whose history -is interesting, because it illustrates mutual advances by +There is a line of successive look-like-nothing\index{look-like-nothing transport} +protocols---obfs2, +obfs3, +ScrambleSuit, +and obfs4---which I~like +because they illustrate the mutual advances of censors and circumventors over several years. -obfs2~\cite{obfs2}, which debuted in 2012 in response to -blocking in Iran~\cite{tor-blog-obfsproxy-next-step-censorship-arms-race}, +obfs2\index{obfs2|textbf}~\cite{obfs2}, which debuted in 2012 in response to +blocking in Iran\index{Iran}~\cite{tor-blog-obfsproxy-next-step-censorship-arms-race}, uses very simple obfuscation inspired by obfuscated-openssh: it is essentially equivalent to sending an encryption key, -followed by the rest of the stream encrypted by that key. -obfs2 is detectable, with no false negatives and negligible false positives, +then the rest of the stream encrypted with that key. +obfs2 is detectable, +with no false negatives\index{false negative} +and negligible false positives\index{false positive}, by even a passive censor who knows how it works; -and it is vulnerable to active probing attacks, -where the censor speculatively connects to the proxy to see what protocol it uses. -However, it was sufficient against the -keyword- or pattern-based censors of its era. -obfs3~\cite{obfs3}---first available in 2013 +and it is vulnerable to active probing\index{active probing} attacks, +where the censor speculatively connects to servers to see what protocols they use. +However, it sufficed against the +keyword-\index{keywords} and pattern-based censors of its era. +obfs3\index{obfs3|textbf}~\cite{obfs3}---first available in 2013 but not really released to users until 2014~\cite{tor-blog-tor-browser-36-released}---was designed to fix the passive detectability of its predecessor. -obfs3 employs a Diffie--Hellman key exchange\index{UniformDH} that +obfs3 employs a Diffie--Hellman key exchange\index{Diffie--Hellman key exchange} that prevents easy passive detection, -but it can still be subverted by an active man in the middle, -and remains vulnerable to active probing. -(The Great Firewall of China had begun active-probing +but it can still be subverted by an active man in the middle\index{man in the middle}, +and remains vulnerable to active probing\index{active probing}. +(The Great Firewall of China\index{Great Firewall of China} +had begun active-probing for obfs2 by January 2013, and -for obfs3 by February 2015, -or possibly as early as July 2013~\cite[\S 5.4]{Ensafi2015b}.) -ScrambleSuit~\cite{Winter2013b}, +for obfs3 by August 2013---see \autoref{tab:active-probing-timeline}.) +ScrambleSuit\index{ScrambleSuit|textbf}~\cite{Winter2013b}, first available to users in 2014~\cite{tor-blog-tor-browser-364-and-40-alpha-1-are-released}, arose in response to the active-probing of obfs3. -Its improvements were the use of an out-of-band secret -to authenticate clients, +Its innovations were the use of an out-of-band secret +to authenticate\index{authentication} clients, and traffic shaping techniques to perturb the -underlying stream's statistical properties. +underlying stream's statistical properties\index{packet size and timing}. When a client connects to a ScrambleSuit proxy, -it must demonstrate knowledge of the out-of-band secret, -or else the server will not respond, -preventing active probing. -(Active probing resistance really has more to do with blocking by address -than with blocking by content, -but it is only because the randomized transports -sufficiently frustrated content-based detection -that active probing became relevant.) -obfs4~\cite{obfs4}, first available in 2014, +it must demonstrate knowledge of the out-of-band secret +before the proxy will respond, +which prevents active probing. +obfs4\index{obfs4|textbf}~\cite{obfs4}, first available in 2014~\cite{tor-blog-tor-browser-45-released}, is an incremental advancement on ScrambleSuit that uses more efficient cryptography, -and additionally authenticates the key exchange -to prevent active man-in-the-middle attacks. - -% obfs4 now used in a variety of projects. +and additionally authenticates\index{authentication} the key exchange +to prevent active man-in-the-middle\index{man in the middle} attacks. There is an advantage in designing polymorphic protocols, -as opposed to steganographic ones, +as opposed to steganographic\index{steganography} ones, which is that every proxy can potentially have its own characteristics. ScrambleSuit and obfs4, in addition to randomizing packet contents, -also shape packet lengths and timing to fit random distributions. -Crucially, the chosen distributions are consistent within each server, -not generated afresh for each connection. +also shape packet sizes and timing\index{packet size and timing} +to fit random\index{randomness+} distributions. +Crucially, the chosen distributions are consistent within each proxy, +but vary across proxies. That means that even if a censor is able to build a profile -for a particular server, -it is not necessarily useful for detecting other server instances. +for a particular proxy, +it is not necessarily useful for detecting other instances. \index{polymorphism|(} \index{detection!by content|)} @@ -1044,15 +1044,10 @@ it is not necessarily useful for detecting other server instances. \index{detection!by address|(} -\begin{itemize} -\item VPN Gate ``collaborative spy detection''~\cite[\S 4.3]{Nobori2014a}, other ways of fingerprinting censor -\item DEFIANCE~\cite{Lincoln2012a} -\end{itemize} - The first-order solution for reaching a destination whose address is blocked is to instead route through a proxy. But a single, static proxy is not much better than direct access, -from a circumvention point of view---a censor can block +for circumvention purposes---a censor can block the proxy just as easily as it can block the destination. Circumvention systems must come up with ways of addressing this problem. @@ -1061,12 +1056,10 @@ There are two reasons why resistance to blocking by address is challenging. The first is due to the nature of network routing: the client must, somehow, encode -the address of the destination into what it sends, -where it can be observed by the censor, -if the encoding is sufficiently transparent. -The second is the insider attack: +the address of the destination into the messages it sends. +The second is the insider attack\index{insider attack}: legitimate clients must have some way to discover -addresses of, e.g., proxies. +the addresses of proxies. By pretending to be a legitimate client, the censor can learn those addresses in the same way. @@ -1074,29 +1067,31 @@ Compared to content obfuscation, there are relatively few strategies for resistance to blocking by address. They are basically five: -private proxies shared by only a few clients; -having a large population of secret proxies and -distributing them carefully; -having a very large population of proxies and -treating them as disposable; -proxying through a service with high collateral damage; -and address spoofing. +\begin{itemize} +\item sharing private proxies among only a few clients +\item having a large population of secret proxies and +distributing them carefully\index{proxy distribution} +\item having a very large population of proxies and +treating them as disposable +\item proxying through a service with high collateral damage\index{collateral damage} +\item address spoofing\index{address spoofing} +\end{itemize} The simplest proxy infrastructure is no infrastructure at all: require every client to set up and maintain a proxy for their own personal use, or for a few of their friends. As long as the use of any single address remains low, it may escape the censor's notice~\cite[\S 4.2]{tor-techreport-2006-11-001}. -The problem with this strategy, of course, is usability and scalability. +The problem with this strategy, of course, is usability\index{usability} and scalability. If it were easy for everyone to set up their own proxy on an unblocked address, they would do it, and blocking by address would not be a concern. The challenge is making such techniques general so they are usable by more than experts. -uProxy~\cite{uproxy} is now working on just that: +uProxy\index{uProxy}~\cite{uproxy} is now working on just that: automating the process of setting up a proxy on a server. -What Köpsell and Hillig call the ``many access points'' model +What Köpsell and Hillig call the ``many access points'' model~\indexauthors{\cite[\S 5.2]{Koepsell2004a}} has been adopted in some form by many circumvention systems. In this model, there are many proxies in operation. They may be full-fledged general-purpose proxies, @@ -1105,55 +1100,58 @@ They may be operated by volunteers or coordinated centrally. In any case, the success of the system hinges on being able to sustain a population of proxies, and distribute information about them to legitimate users, -without revealing them all to the censor. +without revealing too many to the censor. Both of these considerations pose challenges. +\index{Tor bridge|(textbf} Tor's blocking resistance design~\cite{tor-techreport-2006-11-001}, based on secret proxies called ``bridges,'' was of this kind. -Volunteers run bridges, which report themselves to central database +Volunteers run bridges, which report themselves to a central database called BridgeDB~\cite{BridgeDB}\index{BridgeDB}. Clients contact BridgeDB through some unblocked out-of-band channel -(HTTPS, email, or word of mouth) in order to learn bridge addresses. -The BridgeDB server takes steps to prevent easy enumeration of the entire database~\cite{tor-bridgedb-spec}. +(HTTPS\index{HTTPS}, email\index{email}, or word of mouth) in order to learn bridge addresses. +The BridgeDB server takes steps to prevent the easy enumeration of its database~\cite{tor-bridgedb-spec}. Each request returns only a small set of bridges, and repeated requests by the same client return the same small set (keyed by a hash of the client's IP address prefix or email address). -Requests through the HTTPS interface require the client -to solve a captcha, and email requests are permitted only +Requests through the HTTPS\index{HTTPS} interface require the client +to solve a captcha\index{captcha}, and email\index{email} requests are honored only from the domains of email providers that are known to limit the rate of account creation. The population of bridges is partitioned into ``pools''---one -pool for HTTPS distribution, one for email, and so on---so that -an exploit allowing enumeration of one distribution method -does not affect the others. -But even these defenses may not be enough: -despite public appeals for volunteers to run bridges -(see for example Dingledine's initial call in 2007~\cite{tor-talk-bridge-announce}), +pool for HTTPS\index{HTTPS} distribution, one for email\index{email}, and so on---so that +if an adversary manages to enumerate one of the pools, +it does not affect the bridges of the others. +But even these defenses may not be enough. +Despite public appeals for volunteers to run bridges +(for example Dingledine's initial call in 2007~\indexauthors{\cite{tor-talk-bridge-announce}}), there have never been more than a few thousand of them, -and Dingledine reported in 2011 that the Great Firewall of China -had managed to enumerate both the HTTPS and email distribution -pools~\cites[\S 1]{tor-techreport-2011-05-001}[\S 1]{tor-techreport-2011-10-002}, -presumably taking advantage of its greater resources. -% (A curious fact, though, is that nearly But nearly all clients use the default bridges~\cite{Matic2017a}. -% I will cover this seeming paradox in more detail in -% \autoref{chap:proxy-probe}.) +and Dingledine reported in 2011 that the Great Firewall of China\index{Great Firewall of China} +managed to enumerate both the HTTPS and email +pools~\cites[\S 1]{tor-techreport-2011-05-001}[\S 1]{tor-techreport-2011-10-002}. +\index{Tor bridge|)} +\index{proxy distribution problem|(} Tor relies on BridgeDB\index{BridgeDB} to provide address blocking resistance -for all its transports that otherwise only have content obfuscation. +for all its transports that otherwise have only content obfuscation. And that is a great strength of such a system. It enables, to some extent, content obfuscation to be developed independently, and rely on an existing generic proxy distribution mechanism -in order to produce an overall plausibly working system. +in order to produce an overall working system. There is a whole line of research, in fact, on the question of how best to distribute information about an existing population of proxies, -which is known as the ``bridge distribution problem'' +which is known as the ``proxy distribution problem'' or ``proxy discovery problem.'' -I will give just a summary of various proposals. -\todo{Short summaries of proxy distribution papers.} -\todo{Better understanding of Kaleidoscope.} -\todo{Enemy at the Gateways~\cite{Nasr2017a}} +Proposals such as +Proximax\index{Proximax}~\cite{McCoy2011a}, +rBridge\index{rBridge}~\cite{Wang2013a}, and +Salmon\index{Salmon}~\cite{Douglas2016a} +aim to make proxy distribution robust +by tracking the reputation of clients +and the unblocked lifetimes of proxies. + % Keyspace hopping~\cite{Feamster2003a} has each client switch % between a small number of proxies according to a pseudorandom schedule; % Kaleidoscope~\cite{Sovran2008b,Sovran2008a} @@ -1161,116 +1159,119 @@ I will give just a summary of various proposals. % Mahdian~\cite{Mahdian2010a} % treats algorithmically a simplified version of the problem % and shows how to isolate malicious client nodes; -% Proximax~\cite{McCoy2011a} -% rBridge~\cite{Wang2013a} -% Salmon~\cite{Douglas2016a} % Hyphae~\cite{LovecruftDeValence2017a} +% Enemy at the Gateways~\cite{Nasr2017a} +\index{proxy distribution problem|)} -A way to make proxy distribution more robust against censors -(but at the same time less usable by clients) +A~way to make proxy distribution more robust against censors +(but at the same time less usable\index{usability} by clients) is to ``poison'' the set of proxy addresses with the addresses of important servers, -blocking which would result in high collateral damage. -VPN Gate employed this idea~\cite[\S 4.2]{Nobori2014a}, +the blocking of which would result in high collateral damage. +VPN Gate\index{VPN Gate} employed this idea~\cite[\S 4.2]{Nobori2014a}, mixing into the their public proxy list the addresses of root DNS servers\index{DNS} -and Windows Update servers. +and Windows Update\index{Windows Update} servers. +\index{port scanning|(} Apart from ``in-band'' discovery of bridges via subversion of a proxy distribution system, one must also worry about ``out-of-band'' discovery, for example by mass scanning~\cites[\S 6]{tor-techreport-2011-10-002}[\S 9.3]{tor-techreport-2006-11-001}. Durumeric et~al. found about 80\% of existing (unobfuscated) -Tor bridges~\cite[\S 4.4]{Durumeric2013a} +Tor bridges\index{Tor bridge}~\indexauthors{\cite[\S 4.4]{Durumeric2013a}} by scanning all of IPv4 on a handful of common bridge ports. % surf and serve~\cite{McLachlan2009a} (didn't actually scan) % extensive analysis~\cite{Ling2012a} (didn't scan) -Matic et~al.\ had similar results in 2017~\cite[\S V.D]{Matic2017a}, +Matic et~al.\ had similar results in 2017~\indexauthors{\cite[\S V.D]{Matic2017a}}, using public search engines in lieu of active scanning. The best solution to the scanning problem is to -do as ScrambleSuit and obfs4 do, +do as ScrambleSuit\index{ScrambleSuit}~\cite{Winter2013b}, +obfs4\index{obfs4}~\cite{obfs4}, +and Shadowsocks\index{Shadowsocks}~\cite{Shadowsocks} do, and associate with each proxy a secret, -without which a client cannot initiate a connection. -The critical part is that the -IP address and port must not constitute -the whole of the information needed to connect to the proxy. +without which a scanner cannot initiate a connection. Scanning for bridges is closely related to active probing, the topic of \autoref{chap:active-probing}. +\index{port scanning|)} -An alternative way of achieving address blocking resistance +Another way of achieving address blocking resistance is to treat proxies as temporary and disposable, rather than permanent and valuable. This is the idea underlying -flash proxy~\cite{Fifield2012a-local} and Snowflake~\cite{snowflake-wiki}\index{Snowflake}. -(Snowflake is the topic of \autoref{chap:snowflake}.) -Even proxy distribution strategies that take churn into account -have in mind proxies that last on the order of at least days. +flash proxy\index{flash proxy}~\cite{Fifield2012a-local} +and Snowflake\index{Snowflake}~(\autoref{chap:snowflake}). +Most proxy distribution strategies +are designed around proxies lasting at least on the order days. In contrast, disposable proxies may last only minutes or hours. Setting up a Tor bridge or even something lighter-weight -like a SOCKS proxy still requires installing some software +like a SOCKS\index{SOCKS} proxy still requires installing some software on a server somewhere. -Flash proxy and Snowflake\index{Snowflake} proxies have a low set-up and tear-down cost: +Flash proxy and Snowflake proxies have a low set-up and tear-down cost: you can run one just by visiting a web page. -These designs do not to need a sophisticated proxy distribution strategy +These designs do not need a sophisticated proxy distribution strategy as long as the rate of proxy creation is kept higher than the censor's rate of discovery. The logic behind diffusing many proxies widely -is that a censor would have to block large swaths of the Internet +is that a censor would have to block large swaths of the Internet\index{blocking!by address} in order to effectively block them. However, it also makes sense to take the opposite tack: have just one or a few proxies, -but choose them to have such high collateral damage +but choose them to have high enough collateral damage\index{collateral damage} that the censor does not dare block them. % Pudd'nhead Wilson: Put all your eggs in the one basket and---watch that basket! -Refraction networking~\cite{refraction-network}, -also called decoy routing, +Refraction networking\index{refraction networking}~\cite{refraction-network} puts proxy capability into network routers---in the middle of paths, rather than at the end. -Clients tag certain flows in a way that is invisible +Clients cryptographically tag certain flows in a way that is invisible to the censor but detectable to a refraction-capable router, which redirects from its apparent destination to some other, covert destination. -The censor has to induce routes that avoid the special routers~\cite{Schuchard2012a}, +In order to prevent circumvention, +the censor has to induce routes that avoid the special routers~\cite{Schuchard2012a}, which is costly~\cite{Houmansadr2014a}. -Domain fronting~\cite{Fifield2015a-local}\index{domain fronting} +Domain fronting\index{domain fronting}~\cite{Fifield2015a-local} has similar properties. Rather than a router, it uses another kind of network intermediary: a content delivery network. -Using properties of HTTPS, a client may request one site +Using properties of HTTPS\index{HTTPS}, a client may request one site while appearing (to the censor) to request another. Domain fronting is the topic of \autoref{chap:domain-fronting}. The big advantage of this general strategy is that the proxies do not need to be kept secret from the censor. +\index{address spoofing|(} The final strategy for address blocking resistance is address spoofing. The notable design in this category is -CensorSpoofer~\cite{Wang2012a}. +CensorSpoofer\index{CensorSpoofer}~\cite{Wang2012a}. A CensorSpoofer client never communicates directly with a proxy. It sends upstream data -through a low-bandwidth, indirect channel such as email or instant messaging, -and downstream data through a simulated VoIP conversation, +through a low-bandwidth, indirect channel such as email\index{email} or instant messaging\index{instant messaging}, +and downstream data through a simulated VoIP\index{VoIP} conversation, spoofed to appear as if it were coming from some unrelated dummy IP address. The asymmetric design is feasible because of the nature of web browsing: typical clients send much less than they receive. The client never even needs to know the actual address of the proxy, -meaning that CensorSpoofer has high resistance to insider attack: +meaning that CensorSpoofer\index{CensorSpoofer} has high resistance to insider attack\index{insider attack}: even running the same software as a legitimate client, the censor does not learn enough information to effect a block. The idea of address spoofing goes back farther; -as early as 2001 -TriangleBoy~\cite{SafeWebTriangleBoy} +as early as 2001, +TriangleBoy\index{TriangleBoy}~\cite{SafeWebTriangleBoy} employed lighter-weight intermediate proxies that -would simply forward client requests +simply forwarded client requests to a long-lived proxy at a static, easily blockable address. % But http://nms.csail.mit.edu/papers/disc-pet2003.pdf footnote 3 says: "TriangleBoy nodes must be trusted (since they are intermediaries in the SSL handshake with the Safeweb server). In the downstream direction, the long-lived proxy would, rather than route back through the intermediate proxy, -spoof its responses so they appeared to originate from the intermediate proxy. -TriangleBoy did not match CensorSpoofer's resistance to insider attack, +only spoof its responses to look as if they came from proxy. +TriangleBoy\index{TriangleBoy} did not match +CensorSpoofer's\index{CensorSpoofer} resistance to insider attack\index{insider attack}, because clients still needed to find and communicate directly with a proxy, -so the whole system basically reduced to the proxy discovery problem, +so the whole system basically reduced to the proxy discovery problem\index{proxy distribution problem}, despite the use of address spoofing. +\index{address spoofing|)} % ReQrypt~\cite{ReQrypt}, introduced in 2017, % proxies only in one direction. @@ -1283,125 +1284,128 @@ despite the use of address spoofing. \index{spheres of influence/visibility|(} -\begin{itemize} -\item Deniable Liaisons~\cite{Narain2014a} -\end{itemize} - -It is usual to assume (conservatively) +\index{blocking!versus detection} +\index{detection!versus blocking} +It is usual to assume, conservatively, that whatever the censor can detect, -it also can block. -That is, to ignore blocking per~se +it also can block; +that is, to ignore blocking per~se and focus only on the detection problem. We know from experience, however, that there are cases in practice where a censor's reach exceeds its grasp: where it is able to detect circumvention -but not block it, -Sometimes it is useful to consider this possibility when modeling. -Khattak, Elahi, et~al.~\cite{Khattak2016a} +but for some reason cannot block it. +It may be useful to consider this possibility when modeling. +Khattak, Elahi, et~al.~\indexauthors{\cite{Khattak2016a}} express it nicely by subdividing the censor's network into a \emph{sphere of influence} within which the censor has active control, and a potentially larger \emph{sphere of visibility} -within which the censor may only observe, not act. +within which the censor may only observe, but not act. A landmark example of this kind of thinking is the 2006 research on -``Ignoring the Great Firewall of China'' by Clayton et~al.~\cite{Clayton2006a}. +``Ignoring the Great Firewall of China''\index{Great Firewall of China} +by Clayton et~al.~\indexauthors{\cite{Clayton2006a}}. They found that the firewall would block connections by injecting\index{packet injection} phony TCP\index{TCP} RST\index{RST (TCP flag)} packets (which cause the connection to be torn down) or SYN/ACK\index{SYN (TCP flag)}\index{ACK (TCP flag)}\index{SYN/ACK (TCP flags)} packets -(which cause the client to become unsynchronized), +(which cause the connection to become unsynchronized), and that simply ignoring the anomalous packets rendered blocking ineffective. -(Why then, did the censor choose to \emph{inject} its own packets, -rather than \emph{drop} the client's or server's? +(Why did the censor choose to \emph{inject} its own packets, +rather than \emph{drop}\index{packet dropping+} those of the client or server? The answer is probably that injection is technically easier to achieve, highlighting a limit on the censor's power.) One can think of this ignoring as shrinking the censor's sphere of influence: it can still technically act within this sphere, -but not in a way that actually effects blocking. +but not in a way that actually achieves blocking. Additionally, intensive measurements revealed many failures to block, and blocking rates that changed over time, suggesting that even when the firewall intends -a general policy of blocking, it does not always succeed. +a policy of blocking, it does not always succeed. Another fascinating example of ``look, but don't touch'' -communication is the ``filecasting'' technique used by Toosheh~\cite{toosheh}, -a file distribution service based on satellite TV broadcasts. +communication is the ``filecasting''\index{filecasting} +technique used by Toosheh\index{Toosheh}~\cite{toosheh}, +a file distribution service based on satellite television\index{satellite television}\index{television} broadcasts. Clients tune their satellite receivers to a certain channel and record the broadcast to a USB flash drive. Later, they run a program on the recording that decodes the information and extracts a bundle of files. The system is unidirectional: clients -can only receive the files that the Toosheh operators choose to provide. -The censor can easily see that Toosheh is in use---it's +can only receive the files that the operators choose to provide. +The censor can easily see that Toosheh\index{Toosheh} is in use---it's a broadcast, after all---but cannot identify users, or block the signal -in any way short of continuous radio jamming or +in any way short of continuous radio jamming\index{radio jamming} or tearing down satellite dishes. % But news reports say that the government of Iran does jam? % http://www.pbs.org/wgbh/pages/frontline/tehranbureau/2012/11/briefing-satellite-wars-why-iran-keeps-jamming.html % https://www.wired.com/2016/04/ingenious-way-iranians-using-satellite-tv-beam-banned-data/ "Yahsat’s satellite hovers over the Middle East, making it harder for the Iranian government to jam the satellite’s signal as it’s broadcast directly down to Iranian dishes" +\index{intrusion detection|(} There are parallels between the study of Internet censorship -and that of network intrusion detection. +and that of network intrusion detection\index{intrusion detection}. One is that a censor's detector may be implemented as a -network intrusion detection system or monitor, +network intrusion detection system or monitor\index{network monitor}, a device ``on the side'' of a communication link that receives a copy of the packets that flow over the link, but that, unlike a router, is not responsible for forwarding the packets onward. Another parallel is that censors are susceptible to the same kinds of evasion and obfuscation attacks that affect network monitors more generally. -In 1998, Ptacek and Newsham~\cite{Ptacek1998a} -and Paxson~\cite[\S 5.3]{Paxson1999a} outlined various attacks +In 1998, Ptacek and Newsham~\indexauthors{\cite{Ptacek1998a}} +and Paxson~\indexauthors{\cite[\S 5.3]{Paxson1999a}} outlined various attacks against network intrusion detection systems---such as -manipulating the IP time-to-live field -or sending overlapping IP fragments---that +manipulating the IP time-to-live\index{TTL} field +or sending overlapping IP fragments\index{fragmentation}---that cause a monitor either to accept what the receiver will reject, or reject what the receiver will accept. A basic problem is that a monitor's position in the middle of the network -does not able it to predict exactly how each packet will be interpreted +does not enable it to predict exactly how each packet will be interpreted by the endpoints. -Cronin et~al.~\cite{Cronin2005a} posit that the monitor's -conflicting goals of +Cronin et~al.~\indexauthors{\cite{Cronin2005a}} posit that the monitor's +conflicting goals of sensitivity (recording all that is relevant) and selectivity (recording \emph{only} what is relevant) -give rise to an unavoidable ``eavesdropper's dilemma.'' +give rise to an unavoidable ``eavesdropper's dilemma\index{eavesdropper's dilemma}.'' % risks of flow blocking (Telex/TapDance~\cite{Frolov2017a-local}) % http://www.icir.org/vern/papers/activemap-oak03.pdf % http://www.icir.org/vern/papers/norm-usenix-sec-01.pdf +\index{intrusion detection|)} Monitor evasion techniques can be used to reduce a censor's sphere of visibility---eliminating certain traffic features from its consideration. -Crandall et~al.~\cite{Crandall2007a} in 2007 suggested -using IP fragmentation to prevent keyword matching -(splitting keywords across fragments). -In 2008 and 2009, Park and Crandall~\cite{Park2010a} explicitly characterized -the Great Firewall as a network intrusion detection system +Crandall et~al.~\indexauthors{\cite{Crandall2007a}} in 2007 suggested +using IP fragmentation\index{fragmentation} to prevent keyword\index{keywords} matching. +In 2008 and 2009, Park and Crandall~\indexauthors{\cite{Park2010a}} explicitly characterized +the Great Firewall\index{Great Firewall of China} +as a network intrusion detection system\index{intrusion detection} and found that a lack of TCP reassembly\index{TCP!reassembly} allowed evading keyword matching. -Winter and Lindskog~\cite{Winter2012a} found that -the Great Firewall still did not do TCP segment reassembly in 2012, -in the course of studying the firewall's proxy-discovery probes. -(Such probes are the subject of \autoref{chap:active-probing}.) -They released a tool, brdgrd~\cite{brdgrd}, +Winter and Lindskog~\indexauthors{\cite{Winter2012a}} found that +the Great Firewall still did not do TCP segment reassembly in 2012. +They released a tool, brdgrd\index{brdgrd}~\cite{brdgrd}, that by manipulating the TCP window size\index{TCP!window size}, prevented the censor's scanners from receiving a full response -in the first packet, thereby foiling active probing. -They reported that the tool stopped working in 2013\cite[\S Software]{Winter2012a-webpage}. -Anderson~\cite{Anderson2012splinternet} gave technical information +in the first packet, thereby foiling active probing\index{active probing}. +Anderson~\indexauthors{\cite{Anderson2012splinternet}} gave technical information on the implementation of the Great Firewall as it existed in 2012, -and observed that it is implemented as an ``on-the-side'' monitor. +and observed that it is implemented as an ``on-the-side'' monitor\index{network monitor}. % https://www.cs.kau.se/philwint/gfw/ -Khattak et~al.~\cite{Khattak2013a} applied a wide array +Khattak et~al.~\indexauthors{\cite{Khattak2013a}} applied a wide array of evasion experiments to the Great Firewall in 2013, identifying classes of working evasions and estimating the cost to counteract them. -\todo{Your State is Not Mine~\cite{Wang2017a}} +Wang et~al.~\indexauthors{\cite{Wang2017a}} +did further evasion experiments against the Great Firewall +a few years later, finding that the firewall had +evolved to prevent some previous evasion techniques, +and discovering new ones. \index{spheres of influence/visibility|)} @@ -1409,17 +1413,17 @@ estimating the cost to counteract them. \section{Early censorship and circumvention} Internet censorship and circumvention began to rise to importance -in the mid-1900s, coinciding with the popularization of the World Wide Web. -At that time, online censorship focused mainly on the web. -Computer security companies were developing technology -for IP address, URL, and web page filtering. +in the mid-1990s, coinciding with the popularization of the World Wide Web\index{World Wide Web}. +% At that time, online censorship focused mainly on the web. +% Computer security companies were developing technology +% for IP address, URL, and web page filtering. Even before national-level censorship by governments became an issue, researchers investigated the blocking policies of personal firewall products---those intended, for example, for parents to install on the family computer. -Meeks and McCullagh~\cite{Meeks1996a} reported in 1996 +Meeks and McCullagh~\indexauthors{\cite{Meeks1996a}} reported in 1996 on the secret blocking lists of several programs. -Bennett Haselton and Peacefire~\cite{Peacefire-censorware} +Bennett Haselton\index{Haselton, Bennett} and Peacefire\index{Peacefire}~\cite{Peacefire-censorware} found many cases of programs blocking more than they claimed, including web sites related to politics and health. @@ -1428,46 +1432,47 @@ including web sites related to politics and health. Governments were not far behind in building legal and technical structures to control the flow of information -on the web. -The term ``Great Firewall of China'' first appeared in an article in -\textsl{Wired} magazine~\cite{wired-china-3} in 1997. -In some cases adapting the same +on the web, +in some cases adapting the same technology originally developed for personal firewalls. -In the wake of the first signs of blocking by ISPs, % DFN/Radikal? +The term ``Great Firewall of China\index{Great Firewall of China}'' +first appeared in an article in +\textsl{Wired}\index{Wired@\textsl{Wired}}~\cite{wired-china-3} in 1997. +In the wake of the first signs of blocking by ISPs\index{Internet service provider}, % DFN/Radikal? people were thinking about how to bypass filters. The circumvention systems of that era were largely HTML-rewriting web proxies\index{HTML-rewriting proxy}: -essentially a form on a web page into which a client would enter a URL. -The server would fetch the desired URL on behalf of the client, +essentially a form on a web page into which a client would enter a URL\index{URL}. +The server would fetch the desired page on behalf of the client, and before returning the response, rewrite all the links -and external references in the page to make the relative +and external references in the page to make them relative to the proxy. -CGIProxy~\cite{CGIProxy}, -SafeWeb~\cite{Martin2002a}, -Circumventor~\cite{Peacefire-circumventor}, -and the first version of Psiphon~\cite{Psiphon1.0} +CGIProxy\index{CGIProxy}~\cite{CGIProxy}, +SafeWeb\index{SafeWeb}~\cite{Martin2002a}, +Circumventor\index{Circumventor}~\cite{Peacefire-circumventor}, +and the first version of Psiphon\index{Psiphon}~\cite{Psiphon1.0} were all of this kind. These systems were effective against their censors of their day---at -least with respect to destination blocking. -And they had the major advantage of requiring no -special client-side software other than a web browser. +least with respect to the blocking of destinations. +They had the major advantage of requiring no +special client-side software other than a web browser\index{web browser+}. The difficulty they faced was second-order blocking as censors discovered and blocked the proxies themselves. Circumvention designers deployed some countermeasures; -for example Circumventor had a mailing list~\cite[\S 7.4]{tor-techreport-2006-11-001} +for example Circumventor\index{Circumventor} had a mailing list~\cite[\S 7.4]{tor-techreport-2006-11-001} which would send out fresh proxy addresses every few days. -A 1996 article by Rich Morin~\cite{Morin1996Rover} -presented a prototype HTML-rewriting proxy\index{HTML-rewriting proxy} called Rover, -which eventually became CGIProxy. +A 1996 article by Rich Morin~\indexauthors{\cite{Morin1996Rover}} +presented a prototype HTML-rewriting proxy\index{HTML-rewriting proxy} called Rover\index{Rover}, +which eventually became CGIProxy\index{CGIProxy}. The article predicted the failure of censorship -based on URL or IP address, +based on URL or IP address\index{detection!by address}, as long as a significant fraction of web servers ran such proxies. -That vision clearly did not come to pass. +That vision has not come to pass. Accumulating a sufficient number of proxies and communicating their addresses securely to clients---in -short, the proxy distribution problem---turned +short, the proxy distribution problem\index{proxy distribution problem}---turned out not to follow automatically, but to be a major sub-problem of its own. @@ -1479,24 +1484,25 @@ censor capabilities. The first censors would be considered weak by today's standards, mostly easy to circumvent by simple countermeasures, -such as tweaking a protocol or using an alternative DNS server.\index{DNS} +such as tweaking a protocol or using an alternative DNS\index{DNS} server. (We see the same progression play out again -when countries begin to experiment with censorship, -such as in Turkey in 2014, where alternative DNS servers\index{DNS} -briefly sufficed to circumvent a block of Twitter~\cite{theguardian-how-to-get-around-turkeys-twitter-ban}.\index{DNS!blocking}) +when countries first begin to experiment with censorship, +such as in Turkey\index{Turkey} in 2014, where alternative DNS servers\index{DNS} +briefly sufficed to circumvent a block of Twitter\index{Twitter}~\cite{theguardian-how-to-get-around-turkeys-twitter-ban}.\index{DNS!blocking}) Not only censors were changing---the world around them was changing as well. -In this field that is so heavily affected by concerns -about collateral damage, the milieu in which +In field of circumvention, which is so heavily affected by concerns +about collateral damage\index{collateral damage}, the milieu in which censors operate is as important as the censors themselves. -A good example of this is the paper on Infranet, +A good example of this is the paper on Infranet\index{Infranet}, the first academic circumvention design I am aware of. -Its authors argued, in 2001, -that TLS would not suffice as a cover protocol~\cite[\S 3.2]{Feamster2002a}, +Its authors argued, not unreasonably for 2001, +that TLS\index{TLS} would not suffice as a cover protocol~\cite[\S 3.2]{Feamster2002a}, because the relatively few TLS-using services at that time -could \emph{all} be blocked without much harm. +could all be blocked without much harm. Certainly the circumstances are different today---domain -fronting\index{domain fronting} and all refraction networking schemes require +fronting\index{domain fronting} and all +refraction networking\index{refraction networking} schemes require the censor to permit TLS. As long as circumvention remains relevant, it will have to change along with changing times, @@ -1657,7 +1663,7 @@ Gwagwa ``A study of {Internet}-based information controls in {Rwanda}''~\cite{Gw One of the earliest technical studies of censorship occurred in a place you might not expect, the German state of North Rhein-Westphalia. -In 2003, Dornseif~\cite{Dornseif2003a} tested ISPs' implementation +In 2003, Dornseif~\cite{Dornseif2003a} tested ISPs'\index{Internet service provider} implementation of a controversial legal order to block web sites. While there were many possible ways to implement the block, none were trivial to implement, nor free of overblocking side effects. @@ -1672,7 +1678,7 @@ This time period seems to be near the onset of DNS tampering in general; Dong~\cite{Dong2002a} had reported it in China in late~2002. Clayton~\cite{Clayton2006b} in 2006 studied a ``hybrid'' blocking system, -CleanFeed by the British ISP BT, +CleanFeed\index{CleanFeed} by the British ISP\index{Internet service provider} BT\index{BT}, that aimed for a better balance of costs and benefits: a ``fast path'' IP address and port matcher acted as a prefilter for the ``slow path,'' a full HTTP proxy. @@ -1708,14 +1714,14 @@ that would become a major theme of future censorship modeling: censors are forced to trade blocking effectiveness against performance. In order to cope with high load at a reasonable costs, -censors may choose the architecture of a network monitor +censors may choose the architecture of a network monitor\index{network monitor} or intrusion detection system, one that can passively monitor and inject packets, but cannot delay or drop them. Nearly contemporary studies by Wolfgarten~\cite{Wolfgarten2006a} and Tokachu~\cite{Tokachu2006a} found cases of -DNS tampering, search engine filtering, and RST injection\index{RST (TCP flag)} +DNS tampering, search engine filtering, and RST injection\index{RST (TCP flag)}\index{packet injection+} caused by keyword detection. In 2007, Lowe, Winters, and Marcus~\cite{Lowe2007a} did detailed experiments on DNS tampering in China.\index{DNS!poisoning} @@ -1724,8 +1730,13 @@ against a list of about 950 likely-censored domains. For about 400 domains, responses came back with bogus IP addresses, chosen from a set of about 20 distinct IP addresses. Eight of the bogus addresses were used more than the others: -a whois lookup placed them in Australia, Canada, China, Hong Kong, and the U.S.\index{United States of America} -By manipulating TTLs, the authors found that the false responses +a whois lookup placed them in +Australia\index{Australia}, +Canada\index{Canada}, +China\index{China}, +Hong Kong\index{Hong Kong}, +and the U.S.\index{United States of America} +By manipulating TTLs\index{TTL}, the authors found that the false responses were injected by an intermediate router: the authentic response would be received as well, only later. A more comprehensive survey~\cite{Anonymous2014a} @@ -1782,7 +1793,7 @@ and later Ensafi et~al.~\cite{Ensafi2015b} did a formal investigation into active probing, a reported capability of the Great Firewall since around October 2011. They focused on the firewall's probing of Tor -and its most common pluggable transports. +and its most common pluggable transports\index{pluggable transports}. Anderson~\cite{Anderson2013a-local} documented network throttling in Iran, @@ -1811,7 +1822,7 @@ forcing TCP's\index{TCP} usual recovery. Khattak et~al.~\cite{Khattak2013a} evaluated the Great Firewall from the perspective that it works like -an intrusion detection system or network monitor, +an intrusion detection system\index{intrusion detection} or network monitor\index{network monitor}, and applied existing technique for evading a monitor the the problem of circumvention. They looked particularly for ways to @@ -2010,7 +2021,7 @@ found at least six shifts in policy during two weeks of site blocking. They observed an escalation in blocking in Turkey: the authorities first poisoned DNS\index{DNS!poisoning} for twitter.com, then blocked the IP addresses of the Google public DNS servers\index{DNS}, -then finally blocked Twitter's IP addresses directly. +then finally blocked Twitter's\index{Twitter} IP addresses directly. In Russia, they found ten unique bogus IP addresses used to poison DNS. \todo[inline]{ @@ -2029,7 +2040,7 @@ under a given model; the evaluation is therefore meaningful only as far as the threat model reflects reality. Without grounding in reality, researchers -risk running an imaginary arms race +risk running an imaginary arms race\index{arms race} that evolves independently of the real one. I~took part, @@ -2500,7 +2511,7 @@ but it has a weakness that makes it trivial to identify, passively and retroactively\index{detection!by content}, needing only the first 20 bytes sent by the client. We turned the weakness of obfs2 to our advantage. -It allowed us to distinguish obfs2 from other +It allowed us to distinguish\index{distinguishability} obfs2 from other random-looking payloads, isolating a set of connections that could belong only to legitimate circumventors or to active probers. @@ -2713,7 +2724,7 @@ the address 202.108.181.70\index{202.108.181.70 (active prober)}, which by itself accounted for 2\% of the probes. (Even this large fraction stands in contrast to previous studies, where that single IP address accounted for roughly half the probes~\cite[\S 4.5.1]{Winter2012a}.) -Among the address ranges are ones belonging to residential ISPs. +Among the address ranges are ones belonging to residential ISPs\index{Internet service provider}. Despite the many source addresses, the probes seems to be managed @@ -2749,7 +2760,7 @@ or at the application layer (like ScrambleSuit\index{ScrambleSuit}, obfs4\index{obfs4}, and Shadowsocks\index{Shadowsocks}). -But when that is not possible, one might hope to distinguish probers by their fingerprints, +But when that is not possible, one might hope to distinguish\index{distinguishability} probers by their fingerprints, idiosyncrasies in their implementation that make them stand out from ordinary clients. In the case of the Great Firewall\index{Great Firewall of China}, @@ -3015,7 +3026,7 @@ experiments for us---such as adding a bridge to the source code but commenting it out---that are further detailed below. We were only concerned with default bridges, not secret ones\index{Tor bridges}. -Our goal was not to estimate the difficulty of the bridge discovery problem, +Our goal was not to estimate the difficulty of the proxy discovery problem\index{proxy discovery problem}, but to better understand how censors deal with what seems to be a trivial task. We focused on bridges using the obfs4 pluggable transport~\cite{obfs4}\index{obfs4}, which not only is the most-used transport and the one @@ -3852,7 +3863,7 @@ From the censor's point of view, messages appear to go not to their actual (presumably blocked) destination, but to some other \emph{front domain}\index{front domain}, one whose blocking would result in high collateral damage\index{collateral damage}. -Because (with certain caveats) the censor cannot distinguish +Because (with certain caveats) the censor cannot distinguish\index{distinguishability} domain-fronted HTTPS requests from ordinary HTTPS requests, it cannot block circumvention without also blocking the front domain. Domain fronting primarily addresses the problem @@ -4003,7 +4014,7 @@ tolerance for collateral damage\index{collateral damage}. But the lack of secrecy makes the censor's choice stark: allow circumvention, or block a domain. This is the way to think about circumvention in general: -not ``can it be blocked?'' +not ``can it be blocked?''\index{unblockability} but ``what does it cost to block?'' @@ -4265,7 +4276,7 @@ data downstream. \index{TLS!fingerprinting|(} Even with domain fronting to hide the destination request, -a censor may try to distinguish circumventing HTTPS\index{HTTPS} connections +a censor may try to distinguish\index{distinguishability} circumventing HTTPS\index{HTTPS} connections by their TLS fingerprint. TLS implementations have a lot of latitude in composing their handshake messages, enough that it is possible to @@ -4488,7 +4499,7 @@ because active probing doesn't help the censor in those cases anyway. During the spring 2014 semester (January--May) I~was enrolled in Vern Paxson's\index{Paxson, Vern} Internet/Network Security\index{Internet/Network Security} course -along with fellow student Chang Lan\index{Lan, Change}. +along with fellow student Chang Lan\index{Lan, Chang}. We made the development and security evaluation of meek our course project. During this time we built browser TLS camouflage extensions\index{TLS!fingerprinting}, @@ -4639,8 +4650,8 @@ and appeared on June~30 at the symposium. The increasing use of domain fronting by various circumvention tools began to attract more attention. -A March 2015 article by Eva Dou and Alistair Barr -in the Wall Street Journal\index{Wall Street Journal}~\indexauthors{\cite{DouBarrWallStreetJournal}} +A March 2015 article by Eva Dou and Alistair Barr in +\textsl{The Wall Street Journal}\index{Wall Street Journal, The@\textsl{Wall Street Journal, The}}~\indexauthors{\cite{DouBarrWallStreetJournal}} described domain fronting and ``collateral freedom''\index{collateral freedom@``collateral freedom''} in general, depicting cloud service providers as being caught in the crossfire @@ -5342,7 +5353,7 @@ WebRTC\index{WebRTC} fingerprinting. Snowflake will always look like WebRTC\index{WebRTC}---that's unavoidable without a major change in architecture. Therefore the best we can hope for is to make -Snowflake's WebRTC hard to distinguish from other +Snowflake's WebRTC hard to distinguish\index{distinguishability} from other applications of WebRTC. And that alone is not enough---it also must be that the censor is reluctant to block @@ -5401,7 +5412,8 @@ SRTP (Secure Real-time Transport Protocol)~\cite{rfc3711} and data channels use DTLS (Datagram TLS)~\cite{rfc6347}. Even though the contents of both are encrypted, -an observer can easily distinguish a media channel from a data channel. +an observer can easily distinguish\index{distinguishability} +a media channel from a data channel\index{WebRTC!media channel versus data channel}. Applications that use media channels have options for doing key exchange: some borrow the DTLS handshake in a process called