diff --git a/thesis.tex b/thesis.tex index 88b9e2d..511da60 100644 --- a/thesis.tex +++ b/thesis.tex @@ -115,6 +115,8 @@ \makeatletter \newcommand{\gobblecomma}[1]{\@gobble{#1}\ignorespaces} \makeatother +\index{active probing!zzz@\gobblecomma|seealso {port scanning}} +\index{address, detection/blocking by,!zzz@\gobblecomma|see {detection/blocking by address}} \index{Amazon CloudFront!zzz@\gobblecomma|seealso {meek-amazon}} \index{Android!zzz@\gobblecomma|seealso {Orbot}} \index{App Engine|see {Google App Engine}} @@ -124,14 +126,13 @@ \index{authentication!zzz@\gobblecomma|seealso {integrity}} \index{Azure|see {Microsoft Azure}} \index{bridge|see {Tor bridge}} -\index{certificate!zzz@\gobblecomma|seealso {common name (X.509; TLS)}} +\index{certificate!zzz@\gobblecomma|seealso {common name (X.509); TLS}} \index{ciphersuite|see {TLS ciphersuite}} \index{content delivery network|see {CDN}} +\index{content, detection/blocking by,!zzz@\gobblecomma|see {detection/blocking by content}} \index{classification!zzz@\gobblecomma|seealso {detection; false positive; false negative}} -\index{CN|see {China; common name (X.509)}} \index{China!zzz@\gobblecomma|seealso {Great Firewall of China}} \index{CloudFront|see {Amazon CloudFront}} -\index{Datagram TLS|see {DTLS}} \index{decoy routing|see {refraction networking}} \index{default bridge|see {Tor bridge, default}} \index{domain fronting!zzz@\gobblecomma|seealso {front domain; meek}} @@ -159,7 +160,7 @@ \index{network address translation|see {NAT}} \index{NIDS|see {intrusion detection}} \index{network intrusion detection system|see {intrusion detection}} -\index{nickname|see {Tor bridge, nickname}} +\index{nickname|see {Tor bridge, nicknames}} \index{OpenSSH|see {obfuscated-openssh}} \index{overblocking|see {false positive}} \index{PETS|see {Privacy Enhancing Technologies Symposium}} @@ -236,22 +237,21 @@ and tools for circumvention that are sound in theory and effective in practice. -% end of cat and mouse - % censorship is an evil to be destroyed. \section{Scope} \index{border firewall|(} -Censorship is an enormous topic, -and Internet censorship is hardly smaller. +Censorship is a big topic, +and even adding the ``Internet'' qualifier +makes it hardly less so. In order to deal with the subject in detail, -it is necessary to limit the scope. -My research is on an +I~must limit the scope. +The subject of this work is an important special case of censorship, which I~call the ``border firewall\index{border firewall}.'' -It is illustrated in \autoref{fig:border-firewall}. +See \autoref{fig:border-firewall}. \begin{figure} \centering @@ -260,8 +260,8 @@ It is illustrated in \autoref{fig:border-firewall}. In the border firewall scenario, a client within a censor-controlled network wants to reach a destination on the outside. -% The nodes and links remind use that there -% is a fabric of hardware beneath our network abstractions. +% The nodes and links are there to remind us that +% a fabric of hardware underlies our network abstractions. } \label{fig:border-firewall} \index{border firewall} @@ -501,7 +501,7 @@ on whatever criteria it finds practical. I like to divide detection techniques into two classes: \emph{detection by content}\index{detection!by content} and \emph{detection by address}\index{detection!by address}. -Detection by content is based on the content or topic\index{content|textbf} +Detection by content is based on the content or topic of the message: keyword\index{keyword filtering} filtering and protocol identification\index{classification} fall into this class. Detection by address is based on the sender or recipient @@ -777,14 +777,14 @@ in mind that that which you are trying to be indistinguishable from must be valued by the censor. Collateral damage provides a way to make statements about censorship resistance quantifiable, at least in a loose sense. -Rather than saying, ``the censor cannot block $X$,''\index{unblockability} -or even, ``the censor is unwilling to block $X$,'' -it is better to say ``in order to block $X$, the censor would have to do $Y$,'' -where $Y$ is some action bearing a cost for the censor. +Rather than saying, ``the censor cannot block~$X$,''\index{unblockability} +or even, ``the censor is unwilling to block~$X$,'' +it is better to say ``in order to block~$X$, the censor would have to do~$Y$,'' +where~$Y$ is some action bearing a cost for the censor. A statement like this makes it clear that some censors may be able to afford the cost of blocking and others may not; there is no ``unblockability\index{unblockability}'' in absolute terms. -Now, actually quantifying the value of $Y$ is a task in itself, +Now, actually quantifying the value of~$Y$ is a task in itself, by no means a trivial one. A~challenge for future work in this field is to assign actual numbers (e.g., in dollars) to the costs borne by censors. @@ -805,7 +805,7 @@ estimated that shutdowns cost millions of dollars per day per 10~million population, the amount depending on a country's level of connectivity. This does not necessarily contradict -the theory of collateral damage\index{collateral damage}. +the theory of collateral damage. It is just that, in some cases, a censor reckons that the benefits of a shutdown @@ -866,7 +866,7 @@ This is not to say that steganography is strictly superior to polymorphism---there are tradeoffs in both directions. Effective mimicry can be difficult to achieve, and in any case its effectiveness can only be judged -against a censor's sensitivity to collateral damage\index{collateral damage}. +against a censor's sensitivity to collateral damage. Whitelisting\index{whitelist}, by its nature, tends to cause more collateral damage than blacklisting\index{blacklist}. And just as obfuscation protocols are @@ -917,14 +917,14 @@ encoding upstream data as crafted requests and downstream data as steganographic images. % Collage~\cite{Burnett2010a}\index{Collage} % Facade~\cite{Jones2014a}\index{Facade} (2014) updates Infranet. -StegoTorus~\indexauthors{\cite{Weinberg2012a}}\index{StegoTorus} (2012) uses custom encoders +StegoTorus~\cite{Weinberg2012a}\index{StegoTorus} uses custom encoders to make traffic resemble common HTTP file types, such as PDF, JavaScript\index{JavaScript}, and Flash. -SkypeMorph~\indexauthors{\cite{Moghaddam2012a}}\index{SkypeMorph} (2012) mimics a Skype\index{Skype} video call. -FreeWave~\indexauthors{\cite{Houmansadr2013a}}\index{FreeWave} (2013) modulates a data stream +SkypeMorph~\cite{Moghaddam2012a}\index{SkypeMorph} mimics a Skype\index{Skype} video call. +FreeWave~\cite{Houmansadr2013a}\index{FreeWave} modulates a data stream into an acoustic signal and transmits it over VoIP\index{VoIP}. Format-transforming encryption, -or FTE~\indexauthors{\cite{Dyer2013a}}\index{FTE} (2013), +or FTE~\cite{Dyer2013a}\index{FTE}, force traffic to conform to a user-specified syntax: if you can describe it, you can imitate it. Despite receiving much research attention, @@ -1299,7 +1299,7 @@ by Clayton et~al.~\indexauthors{\cite{Clayton2006a}}. They found that the firewall would block connections by injecting\index{packet injection} phony TCP\index{TCP} RST\index{RST} packets (which cause the connection to be torn down) -or SYN/ACK\index{SYN}\index{ACK}\index{SYN/ACK)} packets +or SYN/ACK\index{SYN}\index{ACK} packets (which cause the connection to become unsynchronized), and that simply ignoring the anomalous packets rendered blocking ineffective. @@ -1941,7 +1941,7 @@ For example, their 2005 report on Internet filtering in China\index{China}~\cite studied the problem from many perspectives, political, technical, and legal. They tested the extent of filtering -of web sites, search engines, blogs\index{blog}, and email\index{email}. +of web sites, search engines, blogs\index{blogs}, and email\index{email}. They found a number of blocked web sites, some related to news and politics, and some on sensitive subjects such as Tibet\index{Tibet} and Taiwan\index{Taiwan}. @@ -1961,7 +1961,7 @@ that Chinese search engines indexed blocked sites (perhaps having a special exemption from the general firewall policy), but did not return them in search results~\cite{oni-bulletin-005}. Censorship of blogs included keyword blocking\index{keyword filtering} -by domestic blogging\index{blog} services, +by domestic blogging\index{blogs} services, and blocking of external domains such as \nolinkurl{blogspot.com}~\cite{oni-bulletin-008}. Email\index{email} filtering was done by the email providers themselves, @@ -2010,14 +2010,14 @@ a globally distributed Internet measurement network, to examine two case studies of censorship: Turkey's\index{Turkey} ban on social media sites in March 2014 and -Russia's\index{Russia} blocking of certain LiveJournal\index{LiveJournal}\index{social media} blogs\index{blog} in March 2014. +Russia's\index{Russia} blocking of certain LiveJournal\index{LiveJournal}\index{social media} blogs\index{blogs} in March 2014. Atlas allows 4 types of measurements: ping, traceroute, DNS resolution\index{DNS}, and TLS certificate\index{certificate} fetching. In Turkey\index{Turkey}, they found at least six shifts in policy during two weeks of site blocking. They observed an escalation in blocking in Turkey: the authorities first poisoned DNS\index{DNS!poisoning} for -\nolinkurl{twitter.com}\index{twitter.com@\nolinkurl{twitter.com}}, +\nolinkurl{twitter.com}, then blocked the IP addresses of the Google public DNS servers\index{DNS}, then finally blocked Twitter's\index{Twitter}\index{social media} IP addresses directly. In Russia, they found ten unique bogus IP addresses used to poison DNS. @@ -2600,7 +2600,6 @@ User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like \index{Accept (HTTP header)} \index{User-Agent (HTTP header)} \index{Chrome web browser} -\index{Safari web browser} where the `\texttt{XX}' is a number that varies. The intent of this probe seems to be the discovery of servers that are capable of domain fronting\index{domain fronting} for Google\index{Google} services, @@ -3014,7 +3013,7 @@ a different blocking technique than the timeouts we have seen more recently.) In 2015 I~used public reports of blocking and non-blocking of the first batch of default obfs4\index{obfs4} bridges to infer a blocking delay of not less than~15 -and not more than 76~days~\indexauthors{\cite{tor-dev-censorship-lag}}. +and not more than 76~days~\cite{tor-dev-censorship-lag}. \index{Tor!bridge!default|)} As security researchers, are accustomed to making @@ -3095,7 +3094,7 @@ because whether default or not, active probing would cause them to be blocked shortly after their first use. -Bridges are identified by a nickname and a port number\index{Tor!bridge!nickname}. +Bridges are identified by a nickname and a port number\index{Tor!bridge!nicknames}. The nickname is an arbitrary identifier, chosen by the bridge operator. So, for example, ``ndnop3:24215''\index{ndnop3 (Tor bridge)} is one bridge, and ``ndnop3:10527''\index{ndnop3 (Tor bridge)} is another on the same IP address. @@ -3172,7 +3171,7 @@ Port numbers are in chronological order of release. \index{Tor!Browser} \index{Orbot} \index{Tor!bridge!default} -\index{Tor!bridge!nickname} +\index{Tor!bridge!nicknames} \index{ndnop3 (Tor bridge)} \index{ndnop5 (Tor bridge)} \index{riemann (Tor bridge)} @@ -3217,7 +3216,7 @@ from the source code repository, or downloads nightly builds, could discover bridges at this stage. \item[Testing release] -\index{Tor!Browser!releases} +\index{Tor!Browser!releases of} Just prior to a public release, Tor Browser developers send candidate builds to a public mailing list\index{tor-qa mailing list} to solicit @@ -3229,12 +3228,12 @@ Occasionally the developers skip the testing period, such as in the case of an urgent security release. \item[Public release] After testing, the releases are made public -and announced on the Tor Blog\index{Tor!Blog}\index{blog}. +and announced on the Tor Blog\index{Tor!Blog}. A~censor could learn of bridges at this stage -by reading the blog and downloading executables. +by reading the blog\index{blogs} and downloading executables. This is also the stage at which the new bridges begin to have an appreciable number of users. -There are two release tracks of Tor Browser: stable\index{Tor!Browser!releases} and alpha. +There are two release tracks of Tor Browser: stable\index{Tor!Browser!releases of} and alpha. Alpha releases are distinguished by an `a' in their version number, for example 6.5a4. According to Tor Metrics~\cite{tor-metrics-webstats-tb}\index{Tor!Metrics}, @@ -4706,9 +4705,9 @@ based on packet sizes and timing\index{packet size and timing}. The first public release of Tor Browser\index{Tor!Browser} that had a built-in easy-to-use meek client was version 4.0-alpha-1 on August 12, 2014~\cite{tor-blog-tor-browser-364-and-40-alpha-1-are-released}. -This was an alpha release\index{Tor!Browser!releases}, +This was an alpha release\index{Tor!Browser!releases of}, used by fewer users than the stable release. -I~made a blog post explaining how to use it a few days later~\cite{tor-blog-how-use-meek-pluggable-transport}. +I~made a blog post\index{blogs} explaining how to use it a few days later~\cite{tor-blog-how-use-meek-pluggable-transport}. The release and blog post had a positive effect on the number of users, however the absolute numbers from around this time are uncertain, because of a mistake I~made in configuring the meek bridge\index{Tor!bridge}. @@ -4731,7 +4730,7 @@ for a history of monthly costs. Tor Browser\index{Tor!Browser} 4.0~\cite{tor-blog-tor-browser-40-released} was released on October 15, 2014. -It was the first stable\index{Tor!Browser!releases} +It was the first stable\index{Tor!Browser!releases of} (not alpha) release to have meek, and it had an immediate effect on the number of users: which jumped from 50 to~500 within a week. @@ -4742,7 +4741,7 @@ servicing both meek-google\index{meek!meek-google} and meek-azure\index{meek!mee individually showed the same increase.) It was a lesson in user behavior: although meek had been available -in an alpha release\index{Tor!Browser!releases} for two months already, +in an alpha release\index{Tor!Browser!releases of} for two months already, evidently a large number of users did not know of it or chose not to try it until the first stable release. @@ -4988,14 +4987,12 @@ It turned out, later, that it had been no common botnet\index{botnet} misusing meek-google\index{meek!meek-google}, but an organized political hacker group, known as Cozy Bear\index{Cozy Bear} or APT29. -Matthew Dunwoody presented observations to that effect -in a FireEye\index{FireEye} blog post~\indexauthors{\cite{fireeye-apt29_domain_frontin}} -in March 2017. -The malware would install a backdoor that operated over a Tor\index{Tor!onion service} +The group's malware would install a backdoor that operated over a Tor\index{Tor!onion service} onion service\index{onion service}, and used meek for camouflage. -He and Nick Carr had earlier presented those findings at DerbyCon\index{DerbyCon} +Dunwoody and Carr presented these findings at DerbyCon\index{DerbyCon} in September 2016~\indexauthors{\cite{DunwoodyCarrDerbyCon2016}}, -but I~was not aware of them until the blog post. +and in a blog post~\indexauthors{\cite{fireeye-apt29_domain_frontin}} +in March 2017 (which is where I~learned of it). The year 2016 brought the first reports of efforts to block meek. These efforts all had in common that they used TLS fingerprinting\index{TLS!fingerprinting}\index{blocking!by content} @@ -5264,7 +5261,7 @@ My main collaborators on the Snowflake project are Arlo Breault\index{Breault, Arlo}, Mia Gil~Epner\index{Gil Epner, Mia}, Serene Han\index{Han, Serene}, and -Hooman Mohajeri Moghaddam\index{Moghaddam, Hooman Mohajeri}. +Hooman Mohajeri Moghaddam\index{Mohajeri Moghaddam, Hooman}. \section{Design} @@ -5572,7 +5569,7 @@ and values in the server's certificate\index{certificate}. \end{description} Snowflake uses a WebRTC\index{WebRTC} library extracted -from the Chromium web browser\index{Chromium web browser}, +from the Chromium web browser\index{Chrome web browser}, which mitigates some potential dead-parrot distinguishers~\cite{Houmansadr2013b}\index{dead-parrot attacks}. But WebRTC remains complicated