diff --git a/thesis.tex b/thesis.tex index 23aa94e..d8e4ee6 100644 --- a/thesis.tex +++ b/thesis.tex @@ -393,6 +393,11 @@ Gabi Nakibly and Dan Boneh. \chapter{Principles of circumvention} \label{chap:principles} +\begin{itemize} +\item Pluggable transports +% ss now has its own plugin system +\end{itemize} + In order to understand the challenges of circumvention, it helps to put yourself in the mindset of a censor. A censor has two high-level functions: detection and blocking. @@ -519,10 +524,8 @@ Endpoint blocking and traffic obfuscation correspond to my detection by address and detection by content; bootstrapping is the challenge of getting a copy of circumvention software and discovering initial proxy addresses. -I tend to fold bootstrapping in with address-based detection, -though for details on one aspect of the problem, -that of discovering bridge addresses, -see \autoref{sec:bridge-distribution}. +I tend to fold bootstrapping in with address-based detection; +see \autoref{sec:address-strategies}. Khattak, Elahi, et~al., in their 2016 survey and systematization of circumvention systems, break detection into four aspects: @@ -533,7 +536,7 @@ and protocol semantics~\cite[\S~2.4]{Khattak2016a}. I think of their ``content,'' ``flow properties,'' and ``protocol semantics'' as all fitting under the heading of content. Tschantz et~al.\ identify ``setup'' and ``usage''~\cite[\S~V]{Tschantz2016a-local}, -and Khattak, Elahi et~al.\ identify +and Khattak, Elahi, et~al.\ identify ``communication establishment'' and ``conversation''~\cite[\S~3.1]{Khattak2016a}, as targets of obfuscation; these mostly correspond to address and content. @@ -679,26 +682,24 @@ Moreover, it gets to the heart of what makes traffic resistant to blocking. There have been many other attempts at defining resistance to blocking. -Pfitzmann and Hansen~\cite{Pfitzmann2010a}, -in a work that aimed to define various terms in anonymity and -censorship resistance, -gave the three notions of -``undetectability,'' -``unobservability,'' and -``unblockability.''\todo{define these} Narain et~al.~\cite{Narain2014a} -characterized the essential component as being -``deniability,'' +called the essential element ``deniability,'' meaning that a user could plausibly claim to have been doing something other than circumventing when confronted with a log of their network activity. -Khattak, Elahi, et~al.~\cite[\S~4]{Khattak2016a} also list +Khattak, Elahi, et~al.~\cite[\S~4]{Khattak2016a} also consider ``deniability'' separately from ``unblockability.'' +% \cite{Houmansadr2011a} also says ``deniability'' % \cite{Burnett2010a} also says ``deniability'' % \cite{Jones2014a} also says ``deniability'' +Houmansadr et~al.~\cite{Houmansadr2011a,Houmansadr2013a,Houmansadr2013b} +used the term ``unobservability,'' +which I feel fails to convey that the censor's +essential function is distinguishing, not observation. Brubaker et~al.~\cite{Brubaker2014a} used the term ``entanglement,'' -which inspired a lot of my own thinking. +which is closer to the mark +and inspired my own thinking. What they call entanglement I think of as indistinguishability, and keep in mind that that which you are trying to be indistinguishable with @@ -755,6 +756,11 @@ is exactly such an irrational decision, at the greater societal level. \section{Content obfuscation strategies} \label{sec:obfuscation-strategies} +\begin{itemize} +\item Sony thing on passive/active detection \cite[\S~5.1]{SladekBroseEANTC} +\item relation to website fingerprinting---circumvention is potentially harder because you can't just use e.g. constant bitrate +\end{itemize} + There are two general strategies to counter content-based blocking. The first is to mimic some content that the censor allows, like HTTP or email. @@ -788,8 +794,44 @@ tends to cause more collateral damage than blacklisting. And just as obfuscation protocols are not purely steganographic or polymorphic, real censors are not purely whitelisting or blacklisting. +Houmansadr et~al.~\cite{Houmansadr2013b} +exhibited weaknesses in ``parrot'' circumvention systems +that mimic a cover protocol but do not perfectly imitate it. +Mimicking a protocol in every detail, down to its error behavior, +is difficult, and any inconsistency is a potential feature +that a censor may exploit. +Wang et~al.~\cite{Wang2015a} found that some of +Houmansadr et~al.'s proposed attacks were impractical, +due to high false-positive rates, +but proposed other attacks designed for efficiency +and low false positives, +against both steganographic and polymorphic protocols. +Geddes et~al.~\cite{Geddes2013a} showed that even perfect imitation +(achieved via tunnelling) may leave vulnerabilities +due to mismatches between the cover protocol and +the covert protocol---for instance randomly dropping packets +may disrupt circumvention more than other uses of the cover protocol. +It's worth noting, though, that apart from active probing and +perhaps entropy measurement, most of the attacks proposed +in academic literature have not been used by censors in practice. -I will list some representative examples of the steganographic strategy. +Some systematizations +(for example those of Brubaker et~al.~\cite[\S~6]{Brubaker2014a}; +Wang et~al.~\cite[\S~2]{Wang2015a}; and +Khattak, Elahi, et~al.~\cite[\S~6.1]{Khattak2016a}) +further subdivide steganographic systems +into those based on mimickry +(attempting to replicate the behavior of a cover protocol) +and tunneling +(sending through a genuine implementation of the cover protocol). +I do not find the distinction useful, +except when speaking of concrete implementation choices; +to me, there are various degrees of fidelity in imitation, +and tunneling only tends to offer higher fidelity +than mimickry. + +I will list some representative circumvention systems +that exemplify the steganographic strategy. Infranet~\cite{Feamster2002a}, way back in 2002, built a covert channel out of HTTP, encoding upstream data in special requests @@ -815,31 +857,163 @@ if you can describe it, you can imitate it. Despite the research attention they have received, steganographic systems have not been as used in practice: of these listed systems, FTE is the only one that -saw substantial deployment. +has seen substantial deployment. + +There are many examples of the randomized, polymorphic strategy. +An important subclass of these are the so-called +look-like-nothing systems that encrypt a stream +without any plaintext header or framing information, +so that it appears to be a uniformly random byte sequence. +A pioneering design was the obfuscated-openssh of Bruce Leidl~\cite{Leidl-obfuscated-openssh}, +which aimed to hide the plaintext packet metadata in the SSH protocol. +obfuscated-openssh worked, in essence, +by first sending a cryptographic key, +then sending ciphertext encrypted with that key. +The encryption of the obfuscation layer was an additional, independent layer +on top of SSH's usual encryption. +A censor could, in principle, purely passively detect and deobfuscate +the protocol just by recovering the key and using it to decrypt the rest---a +situation partially mitigated by the use of an expensive key derivation function +based on iterated hashing. +obfuscated-openssh could optionally incorporate a pre-shared password +into the key derivation function, which would prevent easy identification. +Dust~\cite{Wiley2011a}, a design by Brandon Wiley, +similarly randomized bytes +(at least in its v1 version---later versions +permitted fitting to distributions other than uniform). +It was not susceptible to passive deobfuscation, +relying on an out-of-band key exchange before each session. +Shadowsocks~\cite{Shadowsocks} +is a lightweight encryption layer atop a simple proxy protocol, +widely used in China. + +There is a line of successive look-like-nothing protocols---known by the names +obfs2, obfs3, ScrambleSuit, and obfs4---whose history +is interesting, because it illustrates mutual advances by +censors and circumventors over several years. +obfs2~\cite{obfs2}, which debuted in 2012 in response to +blocking in Iran~\cite{tor-blog-obfsproxy-next-step-censorship-arms-race}, +uses very simple obfuscation inspired by obfuscated-openssh: +it is essentially equivalent to sending an encryption key, +followed by the rest of the stream encrypted by that key. +obfs2 is detectable, with no false negatives and negligible false positives, +by even a passive censor who knows how it works; +and it is vulnerable to active probing attacks, +where the censor speculatively connects to the proxy to see what protocol it uses. +However, it was sufficient against the +keyword- or pattern-based censors of its era. +obfs3~\cite{obfs3}---first available in 2013 +but not really released to users until +2014~\cite{tor-blog-tor-browser-36-released}---was designed +to fix the passive detectability of its predecessor. +obfs3 employs a Diffie--Hellman key exchange\index{UniformDH} that +prevents easy passive detection, +but it can still be subverted by an active man in the middle, +and remains vulnerable to active probing. +(The Great Firewall of China had begun active-probing +for obfs2 by January 2013, and +for obfs3 by February 2015, +or possibly as early as July 2013~\cite[\S~5.4]{Ensafi2015b}.) +ScrambleSuit~\cite{Winter2013b}, +first available to users in 2014~\cite{tor-blog-tor-browser-364-and-40-alpha-1-are-released}, +arose in response to the active-probing of obfs3. +Its improvements were the use of an out-of-band secret +to authenticate clients, +and traffic shaping techniques to perturb the +underlying stream's statistical properties. +When a client connects to a ScrambleSuit proxy, +it must demonstrate knowledge of the out-of-band secret, +or else the server will not respond, +preventing active probing. +(Active probing resistance really has more to do with blocking by address +than with blocking by content, +but it is only because the randomized transports +sufficiently frustrated content-based detection +that active probing became relevant.) +obfs4~\cite{obfs4}, first available in 2014, +is an incremental advancement on ScrambleSuit +that uses more efficient cryptography, +and additionally authenticates the key exchange +to prevent active man-in-the-middle attacks. + +% obfs4 now used in a variety of projects. + +There is an advantage in designing polymorphic protocols, +as opposed to steganographic ones, +which is that every proxy can potentially have its own +characteristics. +ScrambleSuit and obfs4, in addition to randomizing packet contents, +also shape packet lengths and timing to fit random distributions. +Crucially, the chosen distributions are consistent within each server, +not generated afresh for each connection. +That means that even if a censor is able to build a profile +for a particular server, +it is not necessarily useful for detecting other server instances. + + +\section{Address blocking resistance strategies} +\label{sec:address-strategies} \dragons -The history of the polymorphic, randomized protocols -known as obfs2~\cite{obfs2}, obfs3~\cite{obfs3}, and obfs4~\cite{obfs4} is interesting -because it tells a story of circumventors changing behavior -in the face of changing censor models. -All of these protocols aim to encode traffic -as a uniformly random sequence of bytes, -leaving no plaintext features for a censor to detect. -The obfs2 protocol used a fairly naive handshake protocol -that appeared random only to a first approximation. -It would have bypassed the keyword- or pattern-based censors -of its era, but it was detectable passively, using a custom detector. -obfs3 improved on obfs2 by adding a clever Diffie--Hellman -key exchange, specially modified to also appear random to a censor. -obfs3 was not trivially detectable passively, -but could be attacked by an active man in the middle, -and was vulnerable to active probing. -obfs4 added an out-of-band secret -that foils both man-in-the-middle and active probing attacks. -ShadowSocks~\cite{Shadowsocks} +Resistance to blocking by address; +obfuscated protocol then prevents blocking by content. + +\begin{itemize} +\item Untrusted Messenger Discovery~\cite{Feamster2003a} +\item Kaleidoscope~\cite{Sovran2008a,Sovran2008b} +\item Mahdian~\cite{Mahdian2010a} +\item Proximax~\cite{McCoy2011a} +\item rBridge~\cite{Wang2013a} +\item Salmon~\cite{Douglas2016a} +\item Hyphae~\cite{LovecruftDeValence2017a} +\item Enemy at the Gateways~\cite{Nasr2017a} +\end{itemize} + +GFW enumerated HTTPS- and email-sourced bridges +\cite{ten-ways-discover-tor-bridges} + +In the usual threat models, though, the censor is assumed to be quite powerful, +capable of dropping, replaying, and forging arbitrary packets, +of \dots +there is usually a concession to the censor needing to operate at line rate, +or of needing to protect important communications (which is an argument about collateral damage), +which provides the weakness that the circumvention system in question exploits. +we already know that such a strong censor model is a fiction for national censors, +for example the GFW acts like an ``on-path'' network monitor +that can inject, but not drop, packets. +the very strong threat model may be +appropriate for e.g. whitelisting corporate or university censors +The mass censors we know are weak if you are not being specifically targeted +Pick a proxy server used by you and no one else +Do any silly thing for obfuscation, it will work, because who cares +There are true challenges in making it scale to large numbers of users +and an adaptive adversary +The cat-and-mouse game is not inevitable---don't think of it as +``circumvention works until it gets popular, then it gets blocked'' +rather as +``you get a free ride until you get popular, after that your thing has to actually work.'' + +Generic rendezvous: BridgeDB and others + +Mass scanning for bridges +Durumeric et~al.~\cite[\S~4.4]{Durumeric2013a} found about 80\% +of Tor bridges by scanning TCP ports 443 and 9001 on IPv4. + +depending on physical aspects of networks +Denali + +infrastructure-based, decoy routing and domain fronting + +Tying questions of ethics\index{ethics} to questions about censor behavior, motivation: +\cite{Wright2011a} (also mentions ``organisational requirements, administrative burden'') +\cite{Jones2015a} +\cite{Crandall2015a} +Censors may come to conclusions different than what we expect +(have a clue or not). + ``Decoy routing'' systems put proxies at the middle of network paths. A special cooperating router lies between the client and the apparent destination of a TCP stream. The router looks for a special cryptographic ``tag'' that is undetectable to the censor. @@ -852,43 +1026,6 @@ Telex~\cite{Wustrow2011a}, and Cirripede~\cite{Houmansadr2011a}. - -Parrot is dead~\cite{Houmansadr2013b} -It's worth noting that the kind of detection -they employ has not been seen used by censors. -Wang et~al.~\cite{Wang2015a} found some of the attacks -to be impractical (because of untenable false-positive rates) -and offer attacks that are more acceptable -to censors as we imagine them. - - -Some systematizations -(for example those of Brubaker et~al.~\cite[\S~6]{Brubaker2014a} -and Khattak, Elahi, et~al.~\cite[\S~6.1]{Khattak2016a}) -draw a distinction between -mimicking and tunneling systems. -That particular distinction is not that relevant to me. -It makes sense when discussing -concrete implementation strategies, -but otherwise the difference is only one of degree of fidelity. -I think of tunneling as a high-fidelity way -of implementing mimickry. - - - -Dust~\cite{Wiley2011a} -ScrambleSuit -obfs4 -crucially, distribution is specific to a server - -relation to website fingerprinting -disadvantage is that you can't just e.g. use constant bitrate -- have to look like something else - -Traffic transformation -look like nothing and look like something -Psiphon anecdote about prepending HTTP to obfssh - - \section{Spheres of influence and visibility} \begin{itemize} @@ -1009,67 +1146,8 @@ estimating the cost to counteract them. \section{Active probing} - -\section{Bridge distribution} -\label{sec:bridge-distribution} - \dragons -Resistance to blocking by address; -obfuscated protocol then prevents blocking by content. - -\begin{itemize} -\item Untrusted Messenger Discovery~\cite{Feamster2003a} -\item Kaleidoscope~\cite{Sovran2008a,Sovran2008b} -\item Mahdian~\cite{Mahdian2010a} -\item Proximax~\cite{McCoy2011a} -\item rBridge~\cite{Wang2013a} -\item Salmon~\cite{Douglas2016a} -\item Hyphae~\cite{LovecruftDeValence2017a} -\item Enemy at the Gateways~\cite{Nasr2017a} -\end{itemize} - - -In the usual threat models, though, the censor is assumed to be quite powerful, -capable of dropping, replaying, and forging arbitrary packets, -of \dots -there is usually a concession to the censor needing to operate at line rate, -or of needing to protect important communications (which is an argument about collateral damage), -which provides the weakness that the circumvention system in question exploits. -we already know that such a strong censor model is a fiction for national censors, -for example the GFW acts like an ``on-path'' network monitor -that can inject, but not drop, packets. -the very strong threat model may be -appropriate for e.g. whitelisting corporate or university censors - - -The mass censors we know are weak if you are not being specifically targeted -Pick a proxy server used by you and no one else -Do any silly thing for obfuscation, it will work, because who cares -There are true challenges in making it scale to large numbers of users -and an adaptive adversary -The cat-and-mouse game is not inevitable---don't think of it as -``circumvention works until it gets popular, then it gets blocked'' -rather as -``you get a free ride until you get popular, after that your thing has to actually work.'' - -Generic rendezvous: BridgeDB and others - -Mass scanning for bridges -Durumeric et~al.~\cite[\S~4.4]{Durumeric2013a} found about 80\% -of Tor bridges by scanning TCP ports 443 and 9001 on IPv4. - -depending on physical aspects of networks -Denali - -infrastructure-based, decoy routing and domain fronting - -Tying questions of ethics\index{ethics} to questions about censor behavior, motivation: -\cite{Wright2011a} (also mentions ``organisational requirements, administrative burden'') -\cite{Jones2015a} -\cite{Crandall2015a} -Censors may come to conclusions different than what we expect -(have a clue or not). \section{Early censorship and circumvention} @@ -1717,6 +1795,8 @@ I helped analyze the network ``fingerprints'' of active probes and how they might be distinguished from connections by legitimate clients. +surf and serve\cite{McLachlan2009a} + The work on active probing appeared in the 2015 research paper ``Examining How the Great Firewall Discovers Hidden Circumvention Servers''~\cite{Ensafi2015b}, which I coauthored with