diff --git a/thesis.tex b/thesis.tex index c4ce977..1f64df0 100644 --- a/thesis.tex +++ b/thesis.tex @@ -7,6 +7,7 @@ \usepackage[utf8]{inputenc} +\usepackage{CJKutf8} \usepackage{graphicx} \usepackage{makeidx} \usepackage{microtype} @@ -20,16 +21,16 @@ % biblatex manual: % "When using the hyperref package, it is preferable to load it after biblatex." -\usepackage[backend=biber,maxbibnames=99,backref=true]{biblatex} +\usepackage[backend=biber,bibencoding=utf8,maxbibnames=99,backref=true]{biblatex} % Remove "URL: " prefix from bibliography URLs. Original declaration is from % texmf-dist/tex/latex/biblatex/biblatex.def. \DeclareFieldFormat{url}{\url{#1}} \bibliography{local,censor-local,censor} \usepackage[hidelinks]{hyperref} +\urlstyle{same} \def\chapterautorefname{Chapter} \def\sectionautorefname{Section} \def\figurenutorefname{Figure} -\urlstyle{same} % Disable metadata for reproducible PDF. @@ -44,6 +45,7 @@ \makeindex \hyphenation{Web-RTC} +\hyphenation{GAE-uploader} \usepackage{yfonts} @@ -51,6 +53,7 @@ \begin{document} +\begin{CJK}{UTF8}{gbsn} \frontmatter % Avoid warnings "destination with the same identifier (name{page.1}) has been already used, duplicate ignored". @@ -231,7 +234,7 @@ Other forms of censorship that are \emph{not} in scope include: \item forum moderation and deletion of social media posts \item deletion-resistant publishing in the vein of the Eternity Service~\cite{Anderson1996a} - (what Köpsell and Hillig call ``censorship resistant publishing systems''), + (what Köpsell and Hillig call ``censorship resistant publishing systems''~\cite[\S~1]{Koepsell2004a}), except insofar as access to such services may be blocked % Dagster~\cite{Stubblefield2001a} @@ -269,127 +272,18 @@ to refer to the border firewall case. \dragons +\autoref{chap:principles} +is the ``thesis'' within the thesis. -\section{My past work} +experience with tor -\dragons +pluggable transports My blind spots: VPNs, systems without research documentation (FreeGate, Ultrasurf, Shadowsocks), foreign-language documentation and forums. -\subsection{Flash proxy} - -I began working on censorship circumvention with flash proxy in 2011. -Flash proxy is targeted at the difficult problem -of proxy address blocking: -it is designed against a censor model -in which the censor can block any IP address it chooses, -but only on a relatively slow timeline of several hours. - -Flash proxy works by running tiny JavaScript proxies in -ordinary users' web browsers. -The mini-proxies serve as temporary stepping stones -to a full-fledged proxy, such as a Tor relay. -The idea is that the flash proxies are too numerous, -diverse, and quickly changing to block effectively. -A censored user may use a particular proxy for -only seconds or minutes before switching to another. -If the censor manages to block the IP address of one proxy, -there is little harm, -because many other temporary proxies are ready to take its place. - -The flash proxy system was designed under interesting constraints -imposed by being partly implemented in JavaScript in the browser. -The proxies sent and received data using the WebSocket protocol, -which allows for socket-like -persistent TCP connections in browsers, but with a catch: -the browser can only -make outgoing connections, not receive incoming ones as a traditional proxy would. -The censored client must somehow inform the system of its own public address, -and then the proxy connects \emph{back} to the client. -This architectural constraint was probably -the biggest impediment to the usability of flash proxy, -because it required users to configure their local router -to permit incoming connections. -(Removing this impediment is the main reason -for the development of Snowflake, described later.) -Flash proxy does not itself try to obfuscate patterns -in the underlying traffic; -it only provides address diversity. - -For the initial ``rendezvous'' step in which a client advertises -its address and a request for proxy service, -flash proxy uses a neat idea: -a low-capacity, but highly covert channel bootstraps -the high-capacity, general-purpose WebSocket channel. -For example, we implemented an automated email-based rendezvous, -in which the client would send its address in an encrypted email to a special address. -While it is difficult to build a useful low-latency bidirectional channel -on top of email, -email is difficult to block -and it is only needed once, at the beginning of a session. -We later replaced the email-based rendezvous with one based on domain fronting, -which would later inspire -meek, described below. - -I was the leader of the flash proxy project and the main developer of its code. -Flash proxy was among the first circumvention systems built for Tor---only -obfs2 is older. -It was first deployed in Tor Browser in January 2013, -and was later retired in January 2016 -after it ceased to see appreciable use. -Its spirit lives on in Snowflake, now under development. - -Flash proxy appeared in the 2012 research paper -``Evading Censorship with Browser-Based Proxies''~\cite{Fifield2012a-local}, -which I coauthored with -Nate Hardison, Jonathan Ellithorpe, Emily Stark, Roger Dingledine, Phil Porras, and Dan Boneh. - -\subsection{OSS, a circumvention prototype} - -OSS, for ``online scanning service,'' -is a design for circumvention based on the use of -third-party web services that issue HTTP requests -to user-specified destinations, -such as an online translation service. -OSS is designed against the model of a censor that -is unwilling to block useful web services that are used for circumvention, -because of the useful service they provide. - -In OSS, the client sends messages to a censored destination -by bouncing them through a third-party scanning service. -The key idea is a deliberate conflation of address and content. -The client asks the scanning service to scan a long URL -that is constructed to encode both the destination host and a data payload. -The destination receives the HTTP request and decodes its payload. -The destination sends data downstream by abusing HTTP redirection, -instructing the scanning service to send another -HTTP request back to the client, with a different payload. -The resistance to blocking of the OSS system hinges -on the abundance of online scanning services that exist. - -% https://trac.torproject.org/projects/tor/ticket/7559 -OSS was never deployed to users. -I judged its overhead and potential to annoy webmasters -to be too great to be practical. -The core idea, however, did see use -as a rendezvous method for flash proxy. -In this method, a helper program -would encode the client's IP address -into a URL\@. The user would then copy and paste the URL into any online scanning service, -which would then forward the information to the flash proxy system. -In fact, this URL encoding was used internally by -the domain fronting--based rendezvous as well, -using a URL as a convenient vehicle for data transfer. - -OSS appeared in the 2013 research paper -``OSS: Using Online Scanning Services for Censorship Circumvention''~\cite{Fifield2013a-local}, -which I coauthored with -Gabi Nakibly and Dan Boneh. - - \chapter{Principles of circumvention} \label{chap:principles} @@ -1047,7 +941,7 @@ Volunteers run bridges, which report themselves to central database called BridgeDB~\cite{BridgeDB}. Clients contact BridgeDB through some unblocked out-of-band channel (HTTPS, email, or word of mouth) in order to learn bridge addresses. -The BridgeDB server takes steps to prevent easy enumeration of the entire database. +The BridgeDB server takes steps to prevent easy enumeration of the entire database~\cite{tor-bridgedb-spec}. Each request returns only a small set of bridges, and repeated requests by the same client return the same small set @@ -1118,7 +1012,8 @@ Tor bridges~\cite[\S~4.4]{Durumeric2013a} by scanning all of IPv4 on a handful of common bridge ports. % surf and serve~\cite{McLachlan2009a} (didn't actually scan) % extensive analysis~\cite{Ling2012a} (didn't scan) -Matic et~al. had similar results in 2017~\cite[\S~V.D]{Matic2017a}, +% https://lists.torproject.org/pipermail/tor-dev/2014-December/007957.html (Project Sonar) +Matic et~al.\ had similar results in 2017~\cite[\S~V.D]{Matic2017a}, using public search engines in lieu of active scanning. The best solution to the scanning problem is to do as ScrambleSuit and obfs4 do, @@ -1173,6 +1068,8 @@ network intermediary: a content delivery network. Using properties of HTTPS, a client may request one site while appearing (to the censor) to request another. Domain fronting is the topic of \autoref{chap:domain-fronting}. +The big advantage of this general strategy is +that the proxies do not need to be kept secret from the censor. The final strategy for address blocking resistance is address spoofing. The notable design in this category is @@ -1328,6 +1225,7 @@ Khattak et~al.~\cite{Khattak2013a} applied a wide array of evasion experiments to the Great Firewall in 2013, identifying classes of working evasions and estimating the cost to counteract them. +\todo{Your State is Not Mine~\cite{Wang2017a}} \section{Early censorship and circumvention} @@ -1436,16 +1334,10 @@ just as censors do. % \cite{dit-hj} first report on DNS hijacking? -\chapter{Censor capabilities} +\chapter{Understanding censors} \dragons -This section surveys past measurement studies -in order to draw specific and general conclusions about censor models. -The objects of this survey are -based on those in the evaluation study done by -me and others in 2016~\cite[\S IV.A]{Tschantz2016a-local}. - The main tool we have to build relevant threat models is the natural study of censors. The study of censors is complicated by difficulty of access: @@ -1551,419 +1443,42 @@ for example how censors make the decision on what to block, and what bureaucratic and other considerations might cause them to work less than optimally. -One of the earliest technical studies of censorship occurred -not in some illiberal place, but in the German state of -North Rhein-Westphalia. -In 2003, Dornseif~\cite{Dornseif2003a} tested ISPs' implementation -of a controversial legal order to block two Nazi web sites. -While there were many possible ways to implement the block, -none were trivial to implement, nor free of overblocking side effects. -The most popular implementation used \emph{DNS tampering}, -simply returning (or injecting) false responses to DNS requests -for the domain names of the blocked sites. -An in-depth survey of DNS tampering -found a variety of implementations, some blocking more -and some blocking less than required by the order. +informing our threat models -% TODO: \cite{Zittrain2003a} +censors' capabilities---presumed and actual +e.g. ip blocking (reaction time?) +active probing -Clayton~\cite{Clayton2006b} in 2006 studied a ``hybrid'' blocking system, -called ``CleanFeed'' by the British ISP BT, -that aimed for a better balance of costs and benefits: -a ``fast path'' IP address and port matcher -acted as a prefilter for the ``slow path,'' a full HTTP proxy. -The system, in use since 2004, -was designed to block access to any of a secret list of -pedophile web sites compiled by a third party. -The author identifies ways to circumvent or attack such a system: -use a proxy, use source routing to evade the blocking router, -obfuscate requested URLs, use an alternate IP address or port, -return false DNS results to put third parties on the ``bad'' list. -They demonstrate that the two-level nature of the blocking system -unintentionally makes it an oracle -that can reveal the IP addresses of sites in the secret blocking list. +Internet curfews (Gabon), limited time of shutdowns shows sensitivity to collateral damage. -\cite{OpenNet2008AccessDenied} +commercial firewalls (Citizen Lab) and bespoke systems -For a decade, the OpenNet Initiative produced reports -on Internet filtering and surveillance in dozens of countries, -until it ceased operation in 2014. -For example, their 2005 report on Internet filtering in China~\cite{oni-china-2005} -studied the problem from many perspectives, -political, technical, and legal. -They translated and interpreted Chinese laws relating to the Internet, -which provide strong legal justifications for filtering. -The laws regulate both Internet users and service providers, -including cybercafes. -They prohibit the transfer of information that is indecent, -subversive, false, criminal, or that reveals state secrets. -The OpenNet Initiative tested the extent of filtering -of web sites, search engines, blogs, and email. -They found a number of blocked web sites, -some related to news and politics, and some on sensitive subjects -such as Tibet and Taiwan. -In some cases, entire sites (domains) were blocked; -in others, only specific pages within a larger site were blocked. -In a small number of cases, sites were accessible by -IP address but not by domain name. -There were cases of overblocking: apparently inadvertently blocked sites -that simply shared an IP address or URL keyword -with an intentionally blocked site. -On seeing a prohibited keyword, the firewall blocked connections -by injecting a TCP RST packet to tear down the connection, -then injecting a zero-sized TCP window, -which would prevent any communication with the same server -for a short time. -Using technical tricks, the authors inferred -that Chinese search engines index blocked sites -(perhaps having a special exemption from the general firewall policy), -but do not return them in search results. -% https://opennet.net/bulletins/005/ -The firewall blocks access searches for certain keywords on Google -as well as the Google Cache---but the latter could be worked around -by tweaking the format of the URL. -% https://opennet.net/bulletins/006/ -Censorship of blogs comprised keyword blocking -by domestic blogging services, -and blocking of external domains such as blogspot.com. -% https://opennet.net/bulletins/008/ -Email filtering is done by the email providers themselves, -not by an independent network firewall. -Email providers seem to implement their filtering rules -independently and inconsistently: -messages were blocked by some providers and not others. -% More ONI? -% \cite{oni-china-2007} -% \cite{oni-china-2009} -% \cite{oni-china-2012} -% \cite{oni-iran-2005} -% \cite{oni-iran-2007} -% \cite{oni-iran-2009} -In 2006, Clayton, Murdoch, and Watson~\cite{Clayton2006a} -further studied the technical aspects of the Great Firewall of China. -They relied on an observation that the firewall was symmetric, -treating incoming and outgoing traffic equally. -By sending web requests from outside the firewall to a web server inside, -they could provoke the same blocking behavior -that someone on the inside would see. -They sent HTTP requests containing forbidden keywords (e.g., ``falun'') -caused the firewall to inject RST packets -towards both the client and server. -Simply ignoring RST packets (on both ends) -rendered the blocking mostly ineffective. -The injected packets had inconsistent TTLs and other anomalies -that enabled their identification. -Rudimentary countermeasures such as splitting keywords -across packets were also effective in avoiding blocking. -The authors of this paper bring up an important point -that would become a major theme of future censorship modeling: -censors are forced to trade blocking effectiveness -against performance. -In order to cope with high load at a reasonable costs, -censors may choose the architecture of a network monitor -or intrusion detection system, -one that can passively monitor and inject packets, -but cannot delay or drop them. +Ongoing, longitudinal measurement of censorship +remains a challenge. +Studies tend to be limited to one geographical region +and one period of time. +Dedicated measurement platforms such as +OONI~\cite{Filasto2012a} and ICLab~\cite{iclab} +are starting to make a dent in this problem, +by providing regular measurements from many locations worldwide. +Even with these, there are challenges around +getting probes into challenging locations +and keeping them running. -A nearly contemporary study by Wolfgarten~\cite{Wolfgarten2006a} -reproduced many of the results of Clayton, Murdoch, and Watson. -Using a rented server in China, the author found cases of -DNS tampering, search engine filtering, and RST injection -caused by keyword sniffing. -Not much later, in 2007, Lowe, Winters, and Marcus~\cite{Lowe2007a} -did detailed experiments on DNS tampering in China. -They tested about 1,600 recursive DNS servers in China -against a list of about 950 likely-censored domains. -For about 400 domains, responses came back with bogus IP addresses, -chosen from a set of about 20 distinct IP addresses. -Eight of the bogus addresses were used more than the others: -a whois lookup placed them in Australia, Canada, China, Hong Kong, and the U.S. -By manipulating TTLs, the authors found that the false responses -were injected by an intermediate router: -the authentic response would be received as well, only later. -A more comprehensive survey~\cite{Anonymous2014a} -of DNS tampering and injection occurred in 2014, -giving remarkable insight into the internal structure -of the censorship machines. -DNS injection happens only at border routers. -IP ID and TTL analysis show that each node -is a cluster of several hundred processes -that collectively inject censored responses. -They found 174 bogus IP addresses, more than previously documented. -They extracted a blacklist of about 15,000 keywords. +Apart from a few reports of, for example, +per annum spending on filtering hardware, +not much is known about how much censorship costs to implement. +In general, contemporary threat models tend to ignore +resource limitations on the part of the censor. -\cite{Wright2012a} +Tying questions of ethics\index{ethics} to questions about censor behavior, motivation: +\cite{Wright2011a} (also mentions ``organisational requirements, administrative burden'') +\cite{Jones2015a} +\cite{Crandall2015a} +Censors may come to conclusions different than what we expect +(have a clue or not). -The Great Firewall, because of its unusual sophistication, -has been an enduring object of study. -Part of what makes it interesting is its many blocking modalities, -both active and passive, proactive and reactive. -The ConceptDoppler project of Crandall et~al.~\cite{Crandall2007a} -measured keyword filtering by the Great Firewall -and showed how to discover new keywords automatically -by latent semantic analysis, using -the Chinese-language Wikipedia as a corpus. -They found limited statefulness in the firewall: -sending a naked HTTP request -without a preceding SYN resulted in no blocking. -In 2008 and 2009, Park and Crandall~\cite{Park2010a} -further tested keyword filtering of HTTP responses. -Injecting RST packets into responses is more difficult -than doing the same to requests, -because of the greater uncertainty in predicting -TCP sequence numbers -once a session is well underway. -In fact, RST injection into responses was hit or miss, -succeeding only 51\% of the time, -with some, apparently diurnal, variation. -They also found inconsistencies in the statefulness of the firewall. -Two of ten injection servers would react to a naked HTTP request; -that it, one sent outside of an established TCP connection. -The remaining eight of ten required an established TCP connection. -Xu et~al.~\cite{Xu2011a} continued the theme of keyword filtering in 2011, -with the goal of -discovering where filters are located at the IP and AS levels. -Most filtering is done at border networks -(autonomous systems with at least one non-Chinese peer). -In their measurements, the firewall was fully stateful: -blocking was never -triggered by an HTTP request outside an established TCP connection. -Much filtering occurs -at smaller regional providers, -rather than on the network backbone. - -Winter and Lindskog~\cite{Winter2012a} -did a formal investigation into active probing, -a reported capability of the Great Firewall since around October 2011. -They focused on the firewall's probing of Tor relays. -Using private Tor relays in Singapore, Sweden, and Russia, -they provoked active probes by -simulating Tor connections, collecting 3295 firewall scans over 17 days. -Over half the scan came from a single IP address in China; -the remainder seemingly came from ISP pools. -Active probing -is initiated every 15 minutes and each burst lasts for about 10 minutes. - -Sfakianakis et~al.~\cite{Sfakianakis2011a} -built CensMon, a system for testing web censorship -using PlanetLab nodes as distributed measurement points. -They ran the system for for 14 days in 2011 across 33 countries, -testing about 5,000 unique URLs. -They found 193 blocked domain–country pairs, 176 of them in China. -CensMon reports the mechanism of blocking. -Across all nodes, it was -18.2\% DNS tampering, -33.3\% IP address blocking, and -48.5\% HTTP keyword filtering. -The system was not run on a continuing basis. -Verkamp and Gupta~\cite{Verkamp2012a} -did a separate study in 11 countries, -using a combination of PlanetLab nodes -and the computers of volunteers. -Censorship techniques vary across countries; -for example, some show overt block pages and others do not. -China was the only stateful censor of the 11. -% \cite{Mathrani2010a} - -PlanetLab is a system that was not originally designed for censorship measurement, -that was later adapted for that purpose. -Another recent example is RIPE Atlas, -a globally distributed Internet -measurement network consisting of physical probes hosted by volunteers, -Atlas allows 4 types of measurements: ping, traceroute, DNS resolution, -and X.509 certificate fetching. -Anderson et~al.~\cite{Anderson2014a} -used Atlas to examine two case studies of censorship: -Turkey's ban on social media sites in March 2014 and -Russia's blocking of certain LiveJournal blogs in March 2014. -In Turkey, they -found at least six shifts in policy during two weeks of site blocking. -They observed an escalation in blocking in Turkey: -the authorities first poisoned DNS for twitter.com, -then blocked the IP addresses of the Google public DNS servers, -then finally blocked Twitter's IP addresses directly. -In Russia, they found ten unique bogus IP addresses used to poison DNS. - -Most research on censors has focused on the blocking of -specific web sites and HTTP keywords. -A few studies have looked at less discriminating -forms of censorship: outright shutdowns and throttling without fully blocking. -Dainotti et~al.~\cite{Dainotti2011a} -reported on the total Internet shutdowns -that took place in Egypt and Libya -in the early months of 2011. -They used multiple measurements to document -the outages as they occurred: -BGP data, a large network telescope, and active traceroutes. -During outages, there was a drop in scanning traffic -(mainly from the Conficker botnet) to their telescope. -By comparing these different measurements, -they showed that the shutdown in Libya was accomplished -in more that one way, -both by altering network routes and by firewalls dropping packets. -Anderson~\cite{Anderson2013a} -documented network throttling in Iran, -which occurred over two major periods between 2011 and 2012. -Throttling degrades network access without totally blocking it, -and is harder to detect than blocking. -The author argues that a hallmark of throttling -is a decrease in network throughput -without an accompanying increase in latency and packet loss, -distinguishing throttling from ordinary network congestion. -Academic institutions were affected by throttling, -but less so than other networks. -Aryan et~al.~\cite{Aryan2013a} -tested censorship in Iran -during the two months before the June 2013 presidential election. -They found multiple blocking methods: HTTP request keyword filtering, -DNS tampering, and throttling. -The most usual method was HTTP request filtering. -DNS tampering (directing to a blackhole IP address) -affected only three domains: -facebook.com, -youtube.com, and -plus.google.com. -SSH connections were throttled down to about 15\% -of the link capacity, -while randomized protocols were throttled almost down to zero -60 seconds into a connection's lifetime. -Throttling seemed to be achieved by dropping packets, thereby -forcing TCP's usual recovery. - -Khattak et~al.~\cite{Khattak2013a} -evaluated the Great Firewall from the perspective that it works like -an intrusion detection system or network monitor, -and applied existing technique for evading a monitor -the the problem of circumvention. -They looked particularly for ways to -evade detection that are expensive for the censor to remedy. -They found that the firewall is stateful, -but only in the client-to-server direction. -The firewall is vulnerable to a variety of TCP- and HTTP-based evasion -techniques, such as overlapping fragments, TTL-limited packets, -and URL encodings. - -Nabi~\cite{Nabi2013a} -investigated web censorship in Pakistan in 2013, using a publicly known -list of banned web sites. -They tested on 5 different networks in Pakistan. -Over half of the sites on the list were blocked by DNS tampering; -less than 2\% were additionally blocked by HTTP filtering -(an injected redirection before April 2013, -or a static block page after that). -They conducted a small survey to find the most -commonly used circumvention methods in Pakistan. -The most used method was public VPNs, at 45\% of respondents. - -Ensafi et~al.~\cite{Ensafi2015a} -employed an intriguing technique to measure censorship -from many locations in China---a ``hybrid idle scan.'' -The hybrid idle scan allows one to test TCP connectivity -between two Internet hosts, -without needing to control either one. -They selected roughly uniformly geographically distributed sites -in China from which to measure connectivity to -Tor relays, Tor directory authorities, -and the web servers of popular Chinese web sites. -There were frequent failures of the firewall resulting -in temporary connectivity, typically lasting in bursts of hours. - -In 2015, Marczak et~al.~\cite{Marczak2015a-local} -investigated an innovation in the capabilities -of the border routers of China, -an attack tool dubbed the ``Great Cannon.'' -The cannon was responsible for denial-of-service attacks -on Amazon CloudFront and GitHub. -The unwitting participants in the attack were -web browsers located outside of China, -who began their attack when the cannon injected -malicious JavaScript into certain HTTP responses -originating in China. -The new attack tool is noteworthy because it demonstrated -previously unseen in-path behavior, -such as packet dropping. - -Not every censor is China, -with its sophisticated homegrown firewall. -A major aspect of censor modeling is that -many censors use commercial firewall hardware. -A case in point is the analysis by -Chaabane et~al.~\cite{Chaabane2014a} -of 600 GB of leaked logs from Blue Coat proxies -used for censorship in Syria. -The logs cover 9 days in July and August 2011, -and contain an entry for every HTTP request. -The authors of the study found evidence of IP address blocking, -domain name blocking, and HTTP request keyword blocking, -and also of users circumventing censorship -by downloading circumvention software or using the Google cache. -All subdomains of .il, the top-level domain for Israel, -were blocked, as were many IP address ranges in Israel. -Blocked URL keywords included -``proxy'', -``hotspotshield'', -``israel'', and -``ultrasurf'' -(resulting in collateral damage to the Google Toolbar -and Facebook Like button because they have ``proxy'' in HTTP requests). -Tor was only lightly censored---only one of several proxies -blocked it, and only sporadically. -% \cite{CitizenLab2013opakistan} -% \cite{Marquis2013planet} - -% \cite{Anderson2012splinternet} -% \cite{Dalek2013a} -% \cite{Gill2015a} - -\cite{Gwagwa_a_study_of_internet-based_information_controls_in_rwanda} -and other OONI. - -Analyzing Internet Censorship in Pakistan\cite{Aceto2016a} - -informing our threat models - -censors' capabilities---presumed and actual -e.g. ip blocking (reaction time?) -active probing - -Internet curfews (Gabon), limited time of shutdowns shows sensitivity to collateral damage. - -commercial firewalls (Citizen Lab) and bespoke systems - -\section{Open problems in censor modeling} - -\dragons - -Ongoing, longitudinal measurement of censorship -remains a challenge. -Studies tend to be limited to one geographical region -and one period of time. -Dedicated measurement platforms such as -OONI~\cite{Filasto2012a} and ICLab~\cite{iclab} -are starting to make a dent in this problem, -by providing regular measurements from many locations worldwide. -Even with these, there are challenges around -getting probes into challenging locations -and keeping them running. - -Apart from a few reports of, for example, -per annum spending on filtering hardware, -not much is known about how much censorship costs to implement. -In general, contemporary threat models tend to ignore -resource limitations on the part of the censor. - -Tying questions of ethics\index{ethics} to questions about censor behavior, motivation: -\cite{Wright2011a} (also mentions ``organisational requirements, administrative burden'') -\cite{Jones2015a} -\cite{Crandall2015a} -Censors may come to conclusions different than what we expect -(have a clue or not). - - -\chapter{Circumvention systems} - -\dragons Evaluating the quality of circumvention systems is tricky, whether they are only proposed or actually deployed. @@ -2049,12 +1564,54 @@ I helped analyze the network ``fingerprints'' of active probes and how they might be distinguished from connections by legitimate clients. -The work on active probing appeared in the 2015 research paper -``Examining How the Great Firewall Discovers Hidden Circumvention Servers''~\cite{Ensafi2015b}, -which I coauthored with -Roya Ensafi, Philipp Winter, Nick Feamster, Nicholas Weaver, Vern Paxson. - - +\begin{figure} +\centering +\includegraphics{figures/active-probing-http} +\caption{ +Active probes received at my web server +over five years. +This is an updated version of Figure~8 +in our paper ``Examining How the Great Firewall +Discovers Hidden Circumvention Servers''; +the vertical blue stripe divides old and new data. +Active probing activity---at least against this server---has +subsided since 2016 +} +\label{fig:active-probing-http} +\end{figure} + +\begin{table} +\begin{tabular}{ll} +August 2010 & Leif Nixon notices strange connections from China in his SSH logs~\cite{Nixon-sshprobes}. \\ +November 2011 & Leif Nixon publishes observations and speculation about the strange SSH connections~\cite{Nixon-sshprobes}. +\end{tabular} +\caption{ +Timeline of active probing. +} +\label{tab:active-probing-timeline} +\end{table} + +The work on active probing appeared in the 2015 research paper +``Examining How the Great Firewall Discovers Hidden Circumvention Servers''~\cite{Ensafi2015b}, +which I coauthored with +Roya Ensafi, Philipp Winter, Nick Feamster, Nicholas Weaver, Vern Paxson. + +Dingledine and Mathewson~\cite[\S~9.3]{tor-techreport-2006-11-001} +McLachlan and Hopper~\cite{McLachlan2009a} +Ling et~al.~\cite{Ling2012a} +Dingledine~\cite{tor-techreport-2011-10-002} + +breakwa11 documented an active-probing vulnerability +in Shadowsocks in 2015(?) +but no evidence of probing for it. +\cite{BlessingStudio-why-do-shadowsocks-deprecate-ota} +\cite{ProgramThink-comment1508314948860} + +\section{Fingerprinting the probers} + +\dragons + + \chapter{Time delays in censors' reactions} \label{chap:proxy-probe} @@ -2119,6 +1676,15 @@ at different network layers. \label{fig:domain-fronting} \end{figure} +\section{Related work on domain fronting} + +\cite{Koepsell2004a} +Bryce Boe +GoAgent +flashproxy + + + Domain fronting assumes a rather strong censor model, essentially equivalent to the state of the art of national censors at the time of its popularization. @@ -2180,6 +1746,21 @@ with whom I continue to collaborate. Today, meek is Tor's second-most-used transport, carrying around 10 terabytes of user traffic each month. +Köpsell and Hillig +were ahead of the game when in 2004 they posed +a hypothetical situation~\cite[\S~5.2]{Koepsell2004a}: +``Imagine that all web pages of the United States are only +retrievable (from abroad) by sending encrypted request to +one and only one special node. +Clearly this idea belongs to +the `all or nothing' concept because a blocker has to block +all requests to this node.'' +The situation they describe---one server +hosting many sites, encrypted and indistinguishably---is +not far off from what exists today with CDNs and HTTPS. +Domain fronting removes the last remaining easy distinguisher, +the domain name that appears in the clear. + Domain fronting appeared in the 2015 research paper ``Blocking-resistant communication through domain fronting''~\cite{Fifield2015a-local}, which I coauthored with Chang Lan, Rod Hynes, Percy Wegmann, and Vern Paxson. @@ -2465,7 +2046,7 @@ said that recently they had altered their systems to prevent domain fronting by enforcing a match between SNI and Host header~\cite{PrinceCloudflareHackerNews}. GreatFire, an anticensorship organization that had also been mentioned, shortly thereafter experienced a new type of denial-of-service attack~\cite{greatfire-we-are-under-attack}, -caused by a Chinese network attack system later called the ``Great Cannon''~\cite{citizenlab-great-cannon}. +caused by a Chinese network attack system later called the ``Great Cannon''~\cite{Marczak2015a-local}. They blamed the attack on the attention brought by the news article. Since initial deployment, the Azure backend @@ -2823,7 +2404,7 @@ It keeps the basic idea of in-browser proxies while fixing the usability problems that hampered the adoption of flash proxy. My main collaborators in this project are -Serene Han and Arlo Breault. +Arlo Breault, Serene Han, Mia Gil Epner, and Hooman Mohajeri. The key difference between flash proxy and Snowflake is the basic communications protocol between client and browser proxy. @@ -2852,9 +2433,77 @@ to a new challenge. Most of the available documentation on Snowflake is linked from the project's wiki page~\cite{snowflake-wiki}. -Mia Gil Epner and I wrote a preprint on the fingerprinting +Mia Gil Epner and I wrote a technical report on the fingerprinting hazards of WebRTC~\cite{FifieldGilEpnerWebRTC}. +\subsection{Flash proxy} + +I began working on censorship circumvention with flash proxy in 2011. +Flash proxy is targeted at the difficult problem +of proxy address blocking: +it is designed against a censor model +in which the censor can block any IP address it chooses, +but only on a relatively slow timeline of several hours. + +Flash proxy works by running tiny JavaScript proxies in +ordinary users' web browsers. +The mini-proxies serve as temporary stepping stones +to a full-fledged proxy, such as a Tor relay. +The idea is that the flash proxies are too numerous, +diverse, and quickly changing to block effectively. +A censored user may use a particular proxy for +only seconds or minutes before switching to another. +If the censor manages to block the IP address of one proxy, +there is little harm, +because many other temporary proxies are ready to take its place. + +The flash proxy system was designed under interesting constraints +imposed by being partly implemented in JavaScript in the browser. +The proxies sent and received data using the WebSocket protocol, +which allows for socket-like +persistent TCP connections in browsers, but with a catch: +the browser can only +make outgoing connections, not receive incoming ones as a traditional proxy would. +The censored client must somehow inform the system of its own public address, +and then the proxy connects \emph{back} to the client. +This architectural constraint was probably +the biggest impediment to the usability of flash proxy, +because it required users to configure their local router +to permit incoming connections. +(Removing this impediment is the main reason +for the development of Snowflake, described later.) +Flash proxy does not itself try to obfuscate patterns +in the underlying traffic; +it only provides address diversity. + +For the initial ``rendezvous'' step in which a client advertises +its address and a request for proxy service, +flash proxy uses a neat idea: +a low-capacity, but highly covert channel bootstraps +the high-capacity, general-purpose WebSocket channel. +For example, we implemented an automated email-based rendezvous, +in which the client would send its address in an encrypted email to a special address. +While it is difficult to build a useful low-latency bidirectional channel +on top of email, +email is difficult to block +and it is only needed once, at the beginning of a session. +We later replaced the email-based rendezvous with one based on domain fronting, +which would later inspire +meek, described below. + +I was the leader of the flash proxy project and the main developer of its code. +Flash proxy was among the first circumvention systems built for Tor---only +obfs2 is older. +It was first deployed in Tor Browser in January 2013, +and was later retired in January 2016 +after it ceased to see appreciable use. +Its spirit lives on in Snowflake, now under development. + +Flash proxy appeared in the 2012 research paper +``Evading Censorship with Browser-Based Proxies''~\cite{Fifield2012a-local}, +which I coauthored with +Nate Hardison, Jonathan Ellithorpe, Emily Stark, Roger Dingledine, Phil Porras, and Dan Boneh. + % \section{How does it end?} % Probably the circumstances of the world change @@ -2862,10 +2511,392 @@ hazards of WebRTC~\cite{FifieldGilEpnerWebRTC}. % How can we reach that moment favorably? % (Spend the war winning, not losing.) +\appendix + +\chapter{Summary of censorship measurement studies} + +Here I survey past measurement studies +which have helped to build models +about censor behavior in general. +The objects of the survey are +based on those in an evaluation study done by +me and others in 2016~\cite[\S IV.A]{Tschantz2016a-local}. + +One of the earliest technical studies of censorship occurred +not in some illiberal place, but in the German state of +North Rhein-Westphalia. +In 2003, Dornseif~\cite{Dornseif2003a} tested ISPs' implementation +of a controversial legal order to block two Nazi web sites. +While there were many possible ways to implement the block, +none were trivial to implement, nor free of overblocking side effects. +The most popular implementation used \emph{DNS tampering}, +simply returning (or injecting) false responses to DNS requests +for the domain names of the blocked sites. +An in-depth survey of DNS tampering +found a variety of implementations, some blocking more +and some blocking less than required by the order. + +% TODO: \cite{Zittrain2003a} + +Clayton~\cite{Clayton2006b} in 2006 studied a ``hybrid'' blocking system, +called ``CleanFeed'' by the British ISP BT, +that aimed for a better balance of costs and benefits: +a ``fast path'' IP address and port matcher +acted as a prefilter for the ``slow path,'' a full HTTP proxy. +The system, in use since 2004, +was designed to block access to any of a secret list of +pedophile web sites compiled by a third party. +The author identifies ways to circumvent or attack such a system: +use a proxy, use source routing to evade the blocking router, +obfuscate requested URLs, use an alternate IP address or port, +return false DNS results to put third parties on the ``bad'' list. +They demonstrate that the two-level nature of the blocking system +unintentionally makes it an oracle +that can reveal the IP addresses of sites in the secret blocking list. + +\cite{OpenNet2008AccessDenied} + +For a decade, the OpenNet Initiative produced reports +on Internet filtering and surveillance in dozens of countries, +until it ceased operation in 2014. +For example, their 2005 report on Internet filtering in China~\cite{oni-china-2005} +studied the problem from many perspectives, +political, technical, and legal. +They translated and interpreted Chinese laws relating to the Internet, +which provide strong legal justifications for filtering. +The laws regulate both Internet users and service providers, +including cybercafes. +They prohibit the transfer of information that is indecent, +subversive, false, criminal, or that reveals state secrets. +The OpenNet Initiative tested the extent of filtering +of web sites, search engines, blogs, and email. +They found a number of blocked web sites, +some related to news and politics, and some on sensitive subjects +such as Tibet and Taiwan. +In some cases, entire sites (domains) were blocked; +in others, only specific pages within a larger site were blocked. +In a small number of cases, sites were accessible by +IP address but not by domain name. +There were cases of overblocking: apparently inadvertently blocked sites +that simply shared an IP address or URL keyword +with an intentionally blocked site. +On seeing a prohibited keyword, the firewall blocked connections +by injecting a TCP RST packet to tear down the connection, +then injecting a zero-sized TCP window, +which would prevent any communication with the same server +for a short time. +Using technical tricks, the authors inferred +that Chinese search engines index blocked sites +(perhaps having a special exemption from the general firewall policy), +but do not return them in search results. +% https://opennet.net/bulletins/005/ +The firewall blocks access searches for certain keywords on Google +as well as the Google Cache---but the latter could be worked around +by tweaking the format of the URL. +% https://opennet.net/bulletins/006/ +Censorship of blogs comprised keyword blocking +by domestic blogging services, +and blocking of external domains such as blogspot.com. +% https://opennet.net/bulletins/008/ +Email filtering is done by the email providers themselves, +not by an independent network firewall. +Email providers seem to implement their filtering rules +independently and inconsistently: +messages were blocked by some providers and not others. +% More ONI? +% \cite{oni-china-2007} +% \cite{oni-china-2009} +% \cite{oni-china-2012} +% \cite{oni-iran-2005} +% \cite{oni-iran-2007} +% \cite{oni-iran-2009} + +In 2006, Clayton, Murdoch, and Watson~\cite{Clayton2006a} +further studied the technical aspects of the Great Firewall of China. +They relied on an observation that the firewall was symmetric, +treating incoming and outgoing traffic equally. +By sending web requests from outside the firewall to a web server inside, +they could provoke the same blocking behavior +that someone on the inside would see. +They sent HTTP requests containing forbidden keywords (e.g., ``falun'') +caused the firewall to inject RST packets +towards both the client and server. +Simply ignoring RST packets (on both ends) +rendered the blocking mostly ineffective. +The injected packets had inconsistent TTLs and other anomalies +that enabled their identification. +Rudimentary countermeasures such as splitting keywords +across packets were also effective in avoiding blocking. +The authors of this paper bring up an important point +that would become a major theme of future censorship modeling: +censors are forced to trade blocking effectiveness +against performance. +In order to cope with high load at a reasonable costs, +censors may choose the architecture of a network monitor +or intrusion detection system, +one that can passively monitor and inject packets, +but cannot delay or drop them. + +A nearly contemporary study by Wolfgarten~\cite{Wolfgarten2006a} +reproduced many of the results of Clayton, Murdoch, and Watson. +Using a rented server in China, the author found cases of +DNS tampering, search engine filtering, and RST injection +caused by keyword sniffing. +Not much later, in 2007, Lowe, Winters, and Marcus~\cite{Lowe2007a} +did detailed experiments on DNS tampering in China. +They tested about 1,600 recursive DNS servers in China +against a list of about 950 likely-censored domains. +For about 400 domains, responses came back with bogus IP addresses, +chosen from a set of about 20 distinct IP addresses. +Eight of the bogus addresses were used more than the others: +a whois lookup placed them in Australia, Canada, China, Hong Kong, and the U.S. +By manipulating TTLs, the authors found that the false responses +were injected by an intermediate router: +the authentic response would be received as well, only later. +A more comprehensive survey~\cite{Anonymous2014a} +of DNS tampering and injection occurred in 2014, +giving remarkable insight into the internal structure +of the censorship machines. +DNS injection happens only at border routers. +IP ID and TTL analysis show that each node +is a cluster of several hundred processes +that collectively inject censored responses. +They found 174 bogus IP addresses, more than previously documented. +They extracted a blacklist of about 15,000 keywords. + +\cite{Wright2012a} + +The Great Firewall, because of its unusual sophistication, +has been an enduring object of study. +Part of what makes it interesting is its many blocking modalities, +both active and passive, proactive and reactive. +The ConceptDoppler project of Crandall et~al.~\cite{Crandall2007a} +measured keyword filtering by the Great Firewall +and showed how to discover new keywords automatically +by latent semantic analysis, using +the Chinese-language Wikipedia as a corpus. +They found limited statefulness in the firewall: +sending a naked HTTP request +without a preceding SYN resulted in no blocking. +In 2008 and 2009, Park and Crandall~\cite{Park2010a} +further tested keyword filtering of HTTP responses. +Injecting RST packets into responses is more difficult +than doing the same to requests, +because of the greater uncertainty in predicting +TCP sequence numbers +once a session is well underway. +In fact, RST injection into responses was hit or miss, +succeeding only 51\% of the time, +with some, apparently diurnal, variation. +They also found inconsistencies in the statefulness of the firewall. +Two of ten injection servers would react to a naked HTTP request; +that it, one sent outside of an established TCP connection. +The remaining eight of ten required an established TCP connection. +Xu et~al.~\cite{Xu2011a} continued the theme of keyword filtering in 2011, +with the goal of +discovering where filters are located at the IP and AS levels. +Most filtering is done at border networks +(autonomous systems with at least one non-Chinese peer). +In their measurements, the firewall was fully stateful: +blocking was never +triggered by an HTTP request outside an established TCP connection. +Much filtering occurs +at smaller regional providers, +rather than on the network backbone. + +Winter and Lindskog~\cite{Winter2012a} +did a formal investigation into active probing, +a reported capability of the Great Firewall since around October 2011. +They focused on the firewall's probing of Tor relays. +Using private Tor relays in Singapore, Sweden, and Russia, +they provoked active probes by +simulating Tor connections, collecting 3295 firewall scans over 17 days. +Over half the scan came from a single IP address in China; +the remainder seemingly came from ISP pools. +Active probing +is initiated every 15 minutes and each burst lasts for about 10 minutes. + +Sfakianakis et~al.~\cite{Sfakianakis2011a} +built CensMon, a system for testing web censorship +using PlanetLab nodes as distributed measurement points. +They ran the system for for 14 days in 2011 across 33 countries, +testing about 5,000 unique URLs. +They found 193 blocked domain–country pairs, 176 of them in China. +CensMon reports the mechanism of blocking. +Across all nodes, it was +18.2\% DNS tampering, +33.3\% IP address blocking, and +48.5\% HTTP keyword filtering. +The system was not run on a continuing basis. +Verkamp and Gupta~\cite{Verkamp2012a} +did a separate study in 11 countries, +using a combination of PlanetLab nodes +and the computers of volunteers. +Censorship techniques vary across countries; +for example, some show overt block pages and others do not. +China was the only stateful censor of the 11. +% \cite{Mathrani2010a} + +PlanetLab is a system that was not originally designed for censorship measurement, +that was later adapted for that purpose. +Another recent example is RIPE Atlas, +a globally distributed Internet +measurement network consisting of physical probes hosted by volunteers, +Atlas allows 4 types of measurements: ping, traceroute, DNS resolution, +and X.509 certificate fetching. +Anderson et~al.~\cite{Anderson2014a} +used Atlas to examine two case studies of censorship: +Turkey's ban on social media sites in March 2014 and +Russia's blocking of certain LiveJournal blogs in March 2014. +In Turkey, they +found at least six shifts in policy during two weeks of site blocking. +They observed an escalation in blocking in Turkey: +the authorities first poisoned DNS for twitter.com, +then blocked the IP addresses of the Google public DNS servers, +then finally blocked Twitter's IP addresses directly. +In Russia, they found ten unique bogus IP addresses used to poison DNS. + +Most research on censors has focused on the blocking of +specific web sites and HTTP keywords. +A few studies have looked at less discriminating +forms of censorship: outright shutdowns and throttling without fully blocking. +Dainotti et~al.~\cite{Dainotti2011a} +reported on the total Internet shutdowns +that took place in Egypt and Libya +in the early months of 2011. +They used multiple measurements to document +the outages as they occurred: +BGP data, a large network telescope, and active traceroutes. +During outages, there was a drop in scanning traffic +(mainly from the Conficker botnet) to their telescope. +By comparing these different measurements, +they showed that the shutdown in Libya was accomplished +in more that one way, +both by altering network routes and by firewalls dropping packets. +Anderson~\cite{Anderson2013a-local} +documented network throttling in Iran, +which occurred over two major periods between 2011 and 2012. +Throttling degrades network access without totally blocking it, +and is harder to detect than blocking. +The author argues that a hallmark of throttling +is a decrease in network throughput +without an accompanying increase in latency and packet loss, +distinguishing throttling from ordinary network congestion. +Academic institutions were affected by throttling, +but less so than other networks. +Aryan et~al.~\cite{Aryan2013a} +tested censorship in Iran +during the two months before the June 2013 presidential election. +They found multiple blocking methods: HTTP request keyword filtering, +DNS tampering, and throttling. +The most usual method was HTTP request filtering. +DNS tampering (directing to a blackhole IP address) +affected only three domains: +facebook.com, +youtube.com, and +plus.google.com. +SSH connections were throttled down to about 15\% +of the link capacity, +while randomized protocols were throttled almost down to zero +60 seconds into a connection's lifetime. +Throttling seemed to be achieved by dropping packets, thereby +forcing TCP's usual recovery. + +Khattak et~al.~\cite{Khattak2013a} +evaluated the Great Firewall from the perspective that it works like +an intrusion detection system or network monitor, +and applied existing technique for evading a monitor +the the problem of circumvention. +They looked particularly for ways to +evade detection that are expensive for the censor to remedy. +They found that the firewall is stateful, +but only in the client-to-server direction. +The firewall is vulnerable to a variety of TCP- and HTTP-based evasion +techniques, such as overlapping fragments, TTL-limited packets, +and URL encodings. + +Nabi~\cite{Nabi2013a} +investigated web censorship in Pakistan in 2013, using a publicly known +list of banned web sites. +They tested on 5 different networks in Pakistan. +Over half of the sites on the list were blocked by DNS tampering; +less than 2\% were additionally blocked by HTTP filtering +(an injected redirection before April 2013, +or a static block page after that). +They conducted a small survey to find the most +commonly used circumvention methods in Pakistan. +The most used method was public VPNs, at 45\% of respondents. + +Ensafi et~al.~\cite{Ensafi2015a} +employed an intriguing technique to measure censorship +from many locations in China---a ``hybrid idle scan.'' +The hybrid idle scan allows one to test TCP connectivity +between two Internet hosts, +without needing to control either one. +They selected roughly uniformly geographically distributed sites +in China from which to measure connectivity to +Tor relays, Tor directory authorities, +and the web servers of popular Chinese web sites. +There were frequent failures of the firewall resulting +in temporary connectivity, typically lasting in bursts of hours. + +In 2015, Marczak et~al.~\cite{Marczak2015a-local} +investigated an innovation in the capabilities +of the border routers of China, +an attack tool dubbed the ``Great Cannon.'' +The cannon was responsible for denial-of-service attacks +on Amazon CloudFront and GitHub. +The unwitting participants in the attack were +web browsers located outside of China, +who began their attack when the cannon injected +malicious JavaScript into certain HTTP responses +originating in China. +The new attack tool is noteworthy because it demonstrated +previously unseen in-path behavior, +such as packet dropping. + +Not every censor is China, +with its sophisticated homegrown firewall. +A major aspect of censor modeling is that +many censors use commercial firewall hardware. +A case in point is the analysis by +Chaabane et~al.~\cite{Chaabane2014a} +of 600 GB of leaked logs from Blue Coat proxies +used for censorship in Syria. +The logs cover 9 days in July and August 2011, +and contain an entry for every HTTP request. +The authors of the study found evidence of IP address blocking, +domain name blocking, and HTTP request keyword blocking, +and also of users circumventing censorship +by downloading circumvention software or using the Google cache. +All subdomains of .il, the top-level domain for Israel, +were blocked, as were many IP address ranges in Israel. +Blocked URL keywords included +``proxy'', +``hotspotshield'', +``israel'', and +``ultrasurf'' +(resulting in collateral damage to the Google Toolbar +and Facebook Like button because they have ``proxy'' in HTTP requests). +Tor was only lightly censored---only one of several proxies +blocked it, and only sporadically. +% \cite{CitizenLab2013opakistan} +% \cite{Marquis2013planet} + +% \cite{Anderson2012splinternet} +% \cite{Dalek2013a} +% \cite{Gill2015a} + +\cite{Gwagwa_a_study_of_internet-based_information_controls_in_rwanda} +and other OONI. + +Analyzing Internet Censorship in Pakistan\cite{Aceto2016a} + \backmatter \printbibliography % \printindex +\end{CJK} \end{document}