This is an English translation of the research paper "一种新的轻量级安全代理协议", published in the Journal of Cyber Security 6(3), .

Lv2020a.en.html
This HTML file for offline reading
Lv2020a.pdf
Original Chinese PDF
Lv2020a.en.zip
Source code for this HTML version of the English translation

Posted .

A new lightweight secure proxy protocol

Journal of Cyber Security
Vol. 6 No. 3
May 2021

Department of Information Security, School of Computer Science, Central China Normal University, Wuhan, China 430079

Keywords
proxy; traffic shaping; traffic obfuscation; TLS; Socks5; DPI
CLC category number
TP309.2
DOI
10.19363/J.cnki.cn10-1380/tn.2021.05.07
Corresponding authors
陈嘉耕 (Chen Jiageng), Ph.D., Associate Professor, Email: jiageng.chen@mail.ccnu.edu.cn.

This project is funded by the National Natural Science Foundation of China (No. 61702212).

Date received
2020-08-02
Date revised
2020-11-09
Date accepted
2021-03-05

Abstract

With the development and widespread application of network technology, establishing secure channels into the Internet environment has become increasingly important. We have designed a lightweight secure proxy protocol that uses the handshake protocol framework of TLSv1.3, which provides better concealment and performance on top of security. The user interface of the agent program is based on the Socks5 protocol, ensuring universality. The handshake process is simulated by filling the actual parameters in the random and encrypted fields of the TLSv1.3 handshake packet to complete the handshake based on ECDHE key exchange and challenge–response mechanisms. In the subsequent proxy forwarding process, additional checks are implemented to avoid the repeated processing of encrypted data, significantly improving communication efficiency. To counter active detection, we have designed proactive countermeasures based on TCP forwarding. In terms of robustness, it can serve as a reverse proxy for servers or as a backend for TCP forwarding–based reverse proxy servers, allowing for flexible construction of redundant channels. Based on the implementation principle, it is named FTLSocks, which means Fake TLS Socks. It features a coroutine pool, space reuse, and a minimum-copy and lock-free design. The measured resource consumption, throughput, response time, etc. under high concurrency are better than existing popular tools.

1 Introduction

Since the birth of the Internet, various attack and espionage techniques have emerged one after another. Across different types of network boundaries, from homes and businesses to ISPs (Internet Service Providers), there are various forms and degrees of hijacking and espionage threats[1]. To protect the confidentiality and integrity of network communications, certain secure proxies are sometimes necessary. For the sake of convenience, the term “cyber attack” mentioned below refers specifically to attacks within the protective scope of proxies, such as cyber hijacking and espionage, and does not include other types of attacks. Similarly, the term “defense” mentioned below refers to the defense against the aforementioned attacks.

There are six main design policies for secure proxies[1], namely Collateral Damage, Outside Scope of Influence, Rate Limit, Decoupled Communication, Overwhelm, and No Target. Collateral Damage means that the attacker must face significant collateral economic losses due to the blocking of a particular protocol, service, or application; Outside Scope of Influence typically refers to seeking entities that cross boundaries or are not under attack; Rate Limit may be a limitation on communication (frequency of channel usage, throughput, etc.) or a limitation on attack efficiency (CAPTCHA, etc.); Decoupled Communication mainly refers to asymmetric communication (which may involve asymmetries in routing paths, timing, protocols, applications, etc.) and out-of-band transmission; Overwhelm typically refers to the deployment of a large number of servers or communication nodes; No Target means making illegal traffic difficult to distinguish (through randomization or tunneling, etc.) or hiding network addresses (through routing or manual transmission, etc.). The purpose of the latter four policies is to reduce the chances of detection and force the attacker to expend more resources until the attack becomes economically unfeasible, thereby compelling the attacker to abandon the attack. A combination of two or three policies is usually used in a specific defense scheme.

To date, about two dozen representative tools have been developed, but only one policy combination (Collateral Damage + No Target) is widely used, effective, and has high performance. Representative protocols include Obfs4, Shadowsocks, TLS tunneling, Lampshade, and Obfuscated SSH, with the corresponding tools being Tor, Shadowsocks, Trojan, Lantern, and Psiphon. Let’s focus on the currently most popular TLS tunneling proxy technology. Trojan completely encapsulates the proxy protocol in a TLS channel and authenticates the identity by detecting whether the TLS handshake phase is handled specially as agreed, and provides normal HTTPS services if authentication fails. V2Ray has a well-developed ecosystem and supports multiple protocols, but aside from the additional features that are merely the icing on the cake, the only practically usable protocol is TLS tunneling, which can be configured to achieve Trojan-like functionality, but is not as stealthy as Trojan. Based on the above discussion, we believe that the approach based on “Collateral Damage + No Target” is most conducive to creating a long-term and effective protocol.

This paper is structured as follows: Section 2 elaborates on the current attack and defense situation and provides background information for the design policy of this protocol; Section 3 describes in detail the design scheme and processing details of this protocol; Section 4 analyzes the security of this protocol; Section 5 compares this protocol with other tools in terms of security and performance; Section 6 further discusses application scenarios and usage methods; and Section 7 summarizes the whole paper.

2 Background

In the loose structure of the Internet, unprotected data is at very high risk of having confidential information stolen or communications hijacked. The confrontation between secure proxy techniques for protecting network communications and corresponding attack techniques has been going on for a long time. This section briefly describes the principles and limitations of existing attack and defense techniques.

2.1 Assumptions, methods, and principles

Attacker assumptions: Constrained by computational resource limits and economic interests, the attacker will not completely block the network nor adopt a whitelisting policy. A stronger assumption is that the attacker uses on-path attacks rather than in-path attacks[2].

Defender assumptions: There is a legitimate way to access external information without resorting to an established secure proxy communication channel.

Basic attack methods: The attack process consists of two phases, discovery and blocking. The basic methods of discovery are DPI and single-packet probing. The basic methods of blocking are DNS poisoning and IP blacklisting. The DPI technique inspects application-layer data; single-packet probing checks service responses; DNS poisoning preemptively responds to DNS requests for specific domain names with incorrect resolution results, causing the correct results to be discarded due to lateness. Any request for a blacklisted IP will be preemptively reset with a TCP RST packet, causing the TCP connection to drop.

Basic defense methods: Proxy. By having an unaffected host outside the attack range assist in forwarding the request, one can bypass the attack.

The IP blacklisting method is simple and effective but has practical application difficulties. As a result, one can use local DNS to temporarily bypass the attack. Specifically, the difficulties are: the large number of IPs that should be blacklisted, the changing nature of the IPs of services, the lack of an easy way to keep track of all the IPs for blacklisted services, and inadequate IPv6 support in devices and tools. The once-popular Go Hosts and IPv6 tunneling solutions exemplify these difficulties.

The proxies initially used include HTTP proxies, Socks proxies, web-based proxies (also known as online proxies), and VPN technology. Commonly used VPNs include IPsec, L2TP, PPTP, OpenVPN, and WireGuard. But none of these four types of proxies was designed to evade attacks. These proxies ensure that blacklisted domain names or IPs do not appear directly in the outer layers of packets, but the first three can be easily detected through expanded DPI checks. VPN technology has important legitimate applications and is highly secure, but it is not effective in defense for two reasons. The first reason is that active VPN connections can be passively detected through packet headers and handshake processes; the second reason is that in certain attack ranges, using a VPN requires registration, and directly blocking unknown VPN connections would not cause collateral damage. These four types of proxies also have some drawbacks. For example, HTTP cannot proxy UDP traffic, VPNs provide only a global proxy, which slows down access to internal websites while exposing internal sites and the address of the VPN server being used[3], and online proxies cannot protect users' confidential data.

By expanding the scope of DPI checks, the four types of proxies mentioned above can be easily blocked, and any illegal services based on domain names are no longer able to survive stably. For example, the Host field in the HTTP Header, the SNI field during the HTTPS handshake, and the ServerName field of TLSv1.3 are inspected. Although it takes a lot of resources to conduct a wide-ranging attack using the DPI technique, it is still possible for the scope of inspection to include every outbound connection. There are two methods remaining: one is to frequently change or add new domain names; the other is to use a proxy. The former requires frequent maintenance by the service provider, and the latter requires active use by the visitor.

At this point, the subsequent related technical countermeasures officially enter the scope of discussion of this paper. The defense techniques that have appeared in history can be roughly divided into eight categories: End-to-End, End-to-Middle, Distributed, Centralized, Relay, TLS-Related, Side Channel, and Steganography.

End-to-End is the technique used by most proxies and refers to the situation where a direct connection is made to a proxy server via an IP or domain name. This requires a certain proxy protocol, which can be divided into two types: use of public protocols and use of private protocols. The use of public protocols is further divided into use of public implementations and use of private implementations. It is usually used in conjunction with the HTTP proxy protocol or the Socks5 protocol as an interface to external services. All are End-to-End except for End-to-Middle, Relay, and Distributed.

End-to-Middle is a method that requires a cooperating server to be placed in the backbone network or on the ISP’s core routing line outside the attack range[4]. This method has several unique characteristics: No user deployment, no contact with the server through access to a specific network address, and significant collateral damage. In the design, the user can attempt to reach the server by accessing any unblocked TLS host within the range of the target ISP and including a hidden label in the request. If the user’s request message happens to pass through the route where the server is deployed, the route will covertly declare its existence and hijack the connection to the proxy target (sometimes requiring that the proxy target also supports TLS connections[5]). In this process, the attacker can discover the presence and approximate route location of the server through passive observation, but cannot learn the specific network address. Reference [6] demonstrates that a strong attacker can change the routing path to bypass ISPs with End-to-Middle servers through techniques such as BGP hijacking, but this comes at the cost of significant additional network resources. Although the author’s intent in designing this method was to make most ISPs install this service, so that attackers would be forced to abandon attacks in regions affected by large-scale network hijacking due to unbearable collateral damage, but no ISP is willing to install such a device on its own network equipment[7]. Moreover, the End-to-Middle technique requires the user to share confidential information in order to take advantage of established TLS channels, which is unacceptable in some cases. End-to-Middle is represented by: Telex, Decoy Routing, Crrippede, TapDance, MultiFlow, and Waterfall. TapDance is the first on-path version of the End-to-Middle scheme and uses a novel selective ciphertext steganography technique[7]. MultiFlow is a Decoupled Communication version of TapDance[5]. Waterfall is the first End-to-Middle scheme to resist rerouting attacks[8].

Distributed can be divided into Center-Assisted Distributed and Fully Distributed. The former has a fast connection speed, high performance, and good connection reliability, but it relies on a secondary central server. The latter has a slow connection speed and low overall network performance, and sometimes fails to connect, but it does not rely on a central server. The former is represented by Tor and the latter by i2p. Theoretically, Fully Distributed can also be divided into Managed and Self-Organized. Refer to P2P protocols ed2k and BT. Since a Fully Distributed network is more vulnerable in terms of connection speed and reliability when facing attacks (it requires connecting nodes inside and outside the attack range instead of any two nodes, but nodes outside the attack range are few and difficult to find), there is no mature solution yet. In order to achieve Decoupled Communication, some schemes make part of the protocol distributed[910].

Centralized can be divided into services provided in batches, and Distributed-Assisted Centralized. The former is the simplest and most direct way, where many nodes are provided at once for users to choose freely from, and all users are directly connected to nodes in an End-to-End manner. The latter is a complex mutual-assisted mechanism dominated by centralized service clusters, but it allows not only for direct connection through preconfigured addresses, but also for discovering, publishing, and sharing of new service addresses through user nodes. Additionally, it may have users undertake a small number of data forwarding tasks. A typical example is the Lampshade protocol[11].

Relay can be divided into Third-Party Website–Based Relay, Self-Built Service–Based Relay, and CDN-Based Relay. Third-Party Website–Based Relay uses the collateral damage policy and mainly uses free public storage services and services such as cloud storage that attackers usually do not block as communication intermediaries. A typical example is CloudTransport[10]. Web-based proxies, which also rely on third-party websites for relaying, are easy to discover, can be blocked, cause no collateral damage, and expose the user’s plaintext communications to the proxy service provider. They were a popular method in the early days of defense techniques. Self-Built service–based Relay uses the Overwhelm policy and the Decoupled Communication policy and avoids a single point of failure, and is mainly simple TCP port forwarding. A typical example is DAIP (Distributed Anti-interference Proxy)[11]. CDN-Based Relay uses the Collateral Damage policy to access banned services by leveraging the domain fronting technique. Domain fronting takes advantage of the characteristics of CDN implementation by writing the unblocked domain name outside the TLS request and the one to be accessed in the HOST header of the internal HTTP protocol, so that the CDN will establish a TLS connection according to the external identifier and request content according to the internal identifier[9]. This is tantamount to exposing the plaintext to the CDN service provider, but since the use of a CDN implies that it has been exposed, this exposure may be acceptable.

TLS-Related is a set of defense techniques derived from exploiting the universality and security of the TLS protocol. There are three main types, namely the domain fronting technique, SNI rewriting, and encrypted data steganography. The domain fronting technique is described above. SNI rewriting is of little practical use as it either directly removes the SNI extension (as per the RFC requirements, the handshake continues even without SNI) or establishes connections using the same certificate for different domain names[12]. Encrypted data steganography, on the other hand, involves somehow concealing the actual data in the TLS communication process. This technique is inefficient and not strong enough to counter traffic analysis[1713].

Side Channel and Steganography are typically not used in isolation, but are embedded in a step of other schemes for the temporary transfer of small amounts of information. This technique is used, for example, in the End-to-Middle series schemes.

2.2 Confrontation scenarios

Monitoring techniques are ranked in ascending order of cost as follows: packet filtering, stateless passive DPI of single packets, stateful DPI of multiple packets on the same connection, stateless active probing of single packets, stateful active probing of single packets, correlation analysis of the same type of traffic over a temporal or spatial span, correlation analysis of specific types of traffic within a short span, and network connection metadata aggregation analysis[1141516]. Methods that require more resources are unlikely to be actually used in the short term unless there is a mandatory requirement.

From the attacker’s point of view, there are only four types of monitoring objects; those that can be passively detected, those that can be actively detected, those that are difficult to detect or have a high false positive rate, and those that are undetectable or normal. The first two types are blocked upon discovery, the third type is blocked during special periods or under special circumstances (where the equilibrium point between damage and utility is raised), and the fourth type cannot be or does not need to be dealt with.

Now let’s take a look at the special cases and how they are dealt with in the context of specific topics from a technical perspective.

Typical examples of objects that can be passively detected include various VPN protocols, original Tor, Socks5, etc. They have distinctive features and can be directly detected and blocked. Typical examples of anti–passive probing proxies include Obfs2 and Obfs3, which appear as random data to passive observers, but active probes can receive clear feedback[17]. The general mode of actual work is to first analyze the traffic characteristics, and if the suspicion level reaches a threshold, send a probe for confirmation[17].

This has led to the development of anti–active probing proxy protocols, characterized by the absence of protocol headers, first-packet authentication, the absence of handshakes or out-of-band handshakes, and the absence of response to unverified requests. Typical examples include Obfs4, Shadowsocks, Lampshade, and Obfuscated SSH.

In the case of Shadowsocks, there have been some twists and turns from its initial invention to reaching the level of resistance to active probing. The security vulnerabilities are caused by the following factors: the absence of an anti-replay mechanism, the absence of integrity protection, the use of stream ciphers, and the server’s decision to disconnect immediately based on whether the value of a particular byte is legitimate. The absence of an anti-replay mechanism means that a particular packet can be sent multiple times, the absence of integrity protection means that the content can be tampered with, the use of stream ciphers means that bytes in a specific location can be precisely modified, and the server's behavior ensures that brute-force attempts are all that is needed. Shadowsocks has since added random delay resistance and a mechanism called OTA (One-Time-Auth), but it remains detectable for almost the same reasons. Block ciphers were later introduced, but the absence of integrity protection led to the possibility of redirection attacks[18]. Since then, the AEAD encryption method has been introduced, but replay attacks are always present due to the fact that the interaction process is of the 0-RTT type. Eventually, a Bloom filter was introduced into the existing active version to mitigate the threat of replay attacks.

It may seem undetectable due to the absence of responses to probes, but in reality it can still be determined with a very low false positive rate by a combination of seven indicators in addition to looking for design flaws[17]. The seven indicators are, specifically: time-out period, byte count threshold for triggering a specific connection closure (closed with a FIN packet if there is unread data in the operating system cache, otherwise closed with a RST packet), packet header byte entropy (protocol headers without plaintext are rare on the Internet[17]), traffic burst (resources originally distributed across multiple hosts are now all requested through the proxy, leading to a sudden increase in traffic), traffic concentration (requests originally directed to different websites are now all directed to the proxy), leaked illegal DNS requests, and unassociated DNS requests (only locally resolved but not actually accessed, and in fact accessed through the proxy). The latter four can also play a role in traffic analysis for other proxies. However, in practice, there seems to be other difficulties, as such proxies still survive with a fairly high probability.

Tunnel proxies are proxies that embed data into another protocol. The external protocol must be encrypted, so SSH and TLS can be utilized, but real-time video streaming is more popular with designers. Typical examples include Trojan, OpenSSH’s port forwarding mode, DeltaShaper, CovertCast, and Facet[131920]. The first two consume less additional resources, while the latter three, which utilize video streams, consume more additional resources, have lower throughput, but are noisier. Tunnel proxies are more difficult to attack in practice. In the case of Facet, for example, blocking 90% of Facet traffic using the most advanced attack methods requires blocking about 40% of all Skype video call traffic[20].

Simulation proxies are proxies that use a third-party implementation to simulate a well-known protocol. Typical examples include SkypeMorph, StegoTorus, and CensorSpoofer. As mentioned in reference [21], simulation proxies are not feasible because they require implementing the target protocol’s details, statistical properties, implementation quirks, and even implementation errors. But we believe it is possible if the protocol is generic, fully encrypted, with little plaintext, and is a public protocol.

Relay proxies are proxies that transfer data via accessible data relay points such as cloud storage services and social platforms. Uploading and downloading data during use is done using the standard methods provided by the platform, so they cannot be distinguished at the network connection level, but considering the risk of collateral damage, the entire service cannot be simply blocked. The performance of this method is close to that of the Tor network, with further degradation if the process involves steganography[10]. There are few users of this method. Examples include CloudTransport, PSTA, and others[1016]. It is difficult to attack.

End-to-Middle, as described earlier, is easy to detect but costly to bypass or block. A better approach is to compel the service provider, by means other than technology, to stop suffering the collateral damage that may result from the continued provision of the service. There is no real-world application of such technology yet.

TLS-Related has been described earlier; SNI-related compatibility is poor; the domain fronting technique relies on CDNs, but many CDNs block domain fronting[22]. Those that remain functional are still difficult to attack and not worth attacking.

A project named “West Chamber Plan” (西厢计划) is based on IPS evasion–related theories outlined in reference [23], and it is dedicated to finding vulnerabilities in the implementation of interception mechanisms to bypass attacks. The basic principle is to intentionally include reset packets at the beginning of the communication and to ignore response packets that match certain characteristics. There are two specific ways to do this: one is to intentionally send incorrect packets to mix RST packets in the normal TCP handshake, thereby making the attack program believe that the connection has been terminated; the other is to rely on intercepted responses to DNS poisoning collected by reconnaissance, summarize the characteristics and ignore the responses that match the characteristics, thus making the real response valid. Such practices are effective in the short term, but they are prone to failure and have significant limitations.

There is also a category of tools with a political background that rely on financial support to buy a large number of domain names and IPs, and rely on ordinary proxy techniques and dynamic replacement of proxy network addresses to force adversaries to expend resources. Not only are these tools not technically significant, but some of them also have huge security vulnerabilities and spy on users[24], so they are omitted here.

Another technical direction is to introduce many new protocols, including HTTP, HTTP2, UDP, mKCP, MTProto, WebSocket, etc. In addition to directly creating new protocol tunnels, there are many combinations of obfuscation methods, such as TCP+TLS+Web, TCP+TLS shunt, WebSocket+TLS+Web, HTTP/2+TLS+WEB, etc. In terms of specific usage, there are those that use public implementations, as well as those that re-implement on their own, and there is also a category that only adds some bytes before and after the packet as a feature. With the exception of schemes that use public implementations, all of them are easy to identify by traffic analysis. The introduction of these protocols has no practical significance beyond increasing the variety[22]. In fact, reality quickly proved this to be true.

Typical examples of this phase are the Shadowsocks’ obfuscation plugins, V2Ray, and Trojan. Those obfuscation plugins only worked for a short period of time and brought no theoretical innovations. V2Ray can be configured and paired with third-party software to achieve a wide variety of layers of shells and relays, and Trojan is a specialized implementation of TCP+TLS+Web. From the outside, it appears that the TLS tunnel–based approach is quite effective in practice as it leverages all the security features of TLS, along with the failing forward mechanism. The downside is twofold: heavy to run and cumbersome to configure.

Machine learning is a common strategy in the identification of encrypted proxy traffic, but there are still many issues that make machine learning impractical, such as unreasonable data sources, unreasonable classification standards, poor transferability of training results, high operating costs, the highest accuracy being only about 94% and being able to train only on the target network, and an intolerably high false positive rate[2526272829303132].

There are several other methods that are not included in the above framework. For example, in a departure from the conventional thinking that the obfuscation protocol itself deals with obfuscation, sessions, and data transmission together, reference [33] adds an additional session layer for the proxy connection, dividing the proxy protocol into a user data layer, a session layer, and an obfuscation layer, thereby greatly reducing the overhead of re-establishing a connection and increasing the ability to maintain uninterrupted user communication across multiple or successive connections. Reference [34] proposes to establish a Websocket connection in a TLS tunnel, which can transmit any binary data, thus supporting any proxy protocol. If the CDN technology switches from the shared private key model to the proxy chain of trust model in the future, then CDN forwarding–based proxy technology may be revitalized[35].

3 Protocol design

3.1 Design ideas

The purpose of this method is to provide a proxy that combines both security and performance. The features include: adding concealment on the basis of confidentiality and integrity, avoiding performance issues caused by multiple encryption, and providing additional protection against both host intrusion and key compromise.

As mentioned above, there are six policies and seven categories of techniques for avoiding cyber attacks, but due to the various reasons mentioned earlier, only one mix of high-performance options is actually viable, and there are only three ideas for that mix:

  1. Provide a generic architecture and maximum flexibility;

  2. Provide a fast, simple, and featureless encrypted proxy;

  3. Nest a protocol that is common but cannot be easily disabled, and make the two theoretically indistinguishable.

There are many reasons for this, the most important of which are the choice of threat model, performance, and the complexity of initial deployment. Some methods settle for trusting third parties to provide services, a not very robust model. Some methods sacrifice too much network performance in order to achieve perfect concealment and anonymity, which is unbearable in most cases. There are also methods that provide additional security features and availability guarantees, but the initial deployment is complex or impossible to accomplish alone.

This method opts for a compromised version of idea (3), and is a specialized implementation of reference [34], which provides as many security features as possible without sacrificing performance or deployment complexity, and is easily expandable to enhance availability and security features. Other functions including resistance to active probing, load balancing, reverse proxy, traffic forwarding, etc. are supported by the real nginx service. When working in tandem with nginx, it only performs simple forwarding tasks, allowing for the theoretically lowest performance overhead.

In this method, the proxy connection is initially established with complete TLSv1.3 handshake features, with the handshake protocol parameters embedded in the random and encrypted fields (e.g., the Random, Session ID and Application Data segments, etc.) of the TLS handshake, and then the X25519-based ECDHE key exchange and challenge response process are completed with the help of the Key Exchange extension. After the session is established, the plaintext data and feature data are encrypted, and the encrypted data is directly forwarded; the local configuration file is padded to a fixed length and then encrypted using the master password–derived key; a long-term key for initial authentication is randomly generated; and the configuration file is automatically generated, encrypted, and saved without the need to manually pass parameters. A more detailed process is described in the next section.

Although reference [23] claims that impersonating another protocol, especially a particular implementation of another proprietary protocol, is difficult and practically impossible, TLSv1.3 allows us to implement a fraction of the TLS protocol and refer to the source code of industrial standard implementations. Careful design ensures that the attacker cannot control the plaintext and that most of the time, real implementations can be exploited, further lowering the impersonation requirements. By taking full advantage of the excellent implementation and detailed features of nginx, it is possible to avoid dealing with unexpected cases of non-compliance with the protocol. In the end, it’s just a matter of taking care of the Client Hello and the modification of encrypted data messages on your own.

3.2 Introduction to the protocol

The Fake TLS Socks protocol is divided into three parts: the simplified Socks5 protocol (Fake TLS Socks Protocol, FTLSS for short), the disguised TLS handshake protocol (Fake TLS Handshake Protocol, FTLSH for short), and the transport protocol (Fake TLS Transport Protocol, FTLST for short). For ease of discussion, the FTLSocks client is also referred to as the Socks5 server below.

The FTLSS protocol is only responsible for the communication between the FTLSocks client and the network application. Then, the FTLSH protocol is used for bidirectional authentication and session key generation. After the handshake is completed, the FTLST protocol is responsible for the forwarding process. Proxy traffic is selectively encrypted and embedded in the encrypted portion of the Application Data packet. The encryption and decryption process maintains an incremental sequence number locally as an Additional Data input to the AEAD. During the entire communication process, the FTLSocks server may send five different TLS alert messages to the FTLSocks client, depending on the circumstances.

The relationship between the entities in the communication process is shown in Figure 1. If the template part is ignored, the effective data of the whole process is shown in Figure 2.

An architecture diagram. There are 5 major components outlined in boxes: socks5 client, FTLS Client, FTLS Server, Target, and Backward nginx. These are arranged from left to right, except that Target and Backward nginx are stacked vertically as two possibilities after FTLS Server. Between socks5 client and FTLS Server, there are labeled arrows: rightward “Handshake”, leftward “Support Method”, rightward “Connect CMD”, leftward “Success”, then finally a heavy bidirectional arrow “DATA FLOW”. The FTLS Client box contains sub-components. At the top, a heavy rightward arrow labeled “Command”. Then a bidirectional pipeline extending from the left side of the box to the right side: a cloud-shaped icon labeled “socks5 server”, a wavy rectangle labeled “BUFFER”, and a cloud-shaped icon labeled “TLS Connector”. At the bottom of the FTLS Client box, there is a beige rectangle with its upper-right corner folded over and with three lines of text: “Add Header”, “Add Length”, and “Encrypt”. Between FTLS Client and FTLS Server, there is another set of labeled arrows: rightward “Client Hello”, leftward “Server Hello”, rightward “Application Data”, leftward “ChangeCipherSpec AD”, then a heavy bidirectional arrow “AD FLOW”. The FTLS Server box contains sub-components. At the upper left, a speech balloon with the text “CMD in AD”. Then a bidirectional pipeline extending from the left side of the box to the right side: a cloud-shaped icon labeled “TLS Listener”, a wavy rectangle labeled “BUFFER”, and a cloud-shaped icon labeled “Outbound”. At the bottom of the FTLS Server box, there is a beige rectangle with its upper-right corner folded over and with three lines of text: “Reduce Header”, “Reduce Length”, and “Decrypt”. On the right side of the FTLS Server box, there are two outgoing heavy arrows: one labeled “Normal” that points up and right to the Target box, and one labeled “Check Fail” that points down and right to the Backward nginx box.
Figure 1: Entity relationship diagram
A communication wire diagram with three actors: FTLS Client, FTLS Server, and nginx. The first message is sent from client to server: “HMAC(time,data,psk),Challenge,ServerName,KeyExchange”. Then from server to nginx: “if illegal, transfer the previous packet” and back from nginx to server: “Real TLS Response”. The remaining communication is between the client and server only. The server sends “Real TLS Response”, and then, in a separate message, “PreviousHMAC,Response,KeyExchange,HMAC(data,sessionKey)”. The client sends “Socks_command,first application data” and then the server sends “if illegal, send Alert and close”. At the bottom of the vertical line for FTLS Client, below all the communication, there is an enclosed box labeled “Close when detect evil”. Between the FTLS Server and nginx, below all the communication, there is an enclosed box with rounded corners labeled “Transfer when detect evil Client”.
Figure 2: Protocol communication overview sequence diagram

The following are prerequisites for the deployment of a proxy communication system based on this protocol, without which the designed security cannot be achieved:

  1. A controlled domain name

  2. A valid certificate (optional)

  3. A dedicated VPS

The security of this protocol is based on the following assumptions:

The adversary has the ability to eavesdrop and initiate attacks, but cannot block the channel or gain direct access to the session key or long-term key. A legitimate client and server pre-share a long-term key. The user can establish a secure channel with hosts outside of the attack range for a short period of time for initial deployment and key distribution. The local host and self-deployed proxy server operate in a secure environment, and the cloud service provider will not disclose network communication data.

Sections 3.3 through 3.5 below detail the three sub-protocols of the FTLS protocol, FTLSS, FTLSH, and FTLST, respectively.

3.3 FTLSS

The FTLSS protocol is used to provide basic Socks5 services on the client side. The Socks5 proxy protocol is widely supported, easy to use, and supports both TCP and UDP. However, for our specific needs, it can be further simplified by retaining only the essential features. To put it simply, only Socks5’s no-authentication mode and the TCP Connect instruction are included here. This part occurs between the Socks client and the Socks server, corresponding to ① in Figure 3.

A communication diagram with 5 actors, represented by tall rounded rectangles, from left to right: Socks Client, Socks Server, FTLS Client, FTLS Server, and Target. FTLS Client and FTLS Server have an additional annotation on them, a beige rectangle with a folded-over corner and the text: “Check TLS Handshake to decide whether it’s a encrypted channel”. Between the actors there are communication arrows, often marked with circled digits, ① to ⑨. ① is a block of communication arrows, enclosed in its own box between Socks Client and Socks Server: rightward “Handshake”, leftward “Support Method”, rightward “Connect CMD”, leftward “Success”, rightward “Data 1”. ② is another enclosed block between Socks Server and FTLS Client: rightward “Command”, rightward “Data 1”. ③ is another enclosed block between FTLS Client and FTLS Server: rightward “Client Hello”, leftward “Server Hello”, rightward “CMD & Data 1 in AD”. ④ is a rightward arrow “Data 1” from FTLS Server to Target. ④ has a speech balloon coming from it that reads “Can be Client Hello”. ⑤ is a leftward arrow “Data 2” from Target to FTLS Server. ⑤ has a speech balloon coming from it that reads “Can be Server Hello”. ⑥ is a leftward arrow “Data 2 in AD” from FTLS Server to FTLS Client. ⑦ is a leftward arrow “Data 2” from FTLS Client to Socks Server. ⑧ is a leftward arrow “Data 2” from Socks Server to Socks Client. ⑨ is marked on a continuous sequence of arrows that go through all the acotors: “Data 3”, “Data 3”, “Data 3 in AD”, “Data 3”.
Figure 3: Communication sequence diagram

Handshake phase: Upon receiving the negotiation request from the client, the Socks5 server checks if it contains the no-authentication mode, and there are only two possible values returned to the client:

Establish a connection: The Socks client sends a command, and the Socks server establishes a new connection with the FTLSocks server upon receipt of the instruction. The response here is always: ‘0x05 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x10 0x10’, which indicates acceptance of the proxy request. Upon receiving this request, the client sends a first raw request packet; i.e., Data1 in Figure 3. The FTLS client behaves differently from a standard Socks5 server at this point, with the Socks instruction and the first request packet being merged into a single packet and then encrypted before being forwarded, as shown in ② in Figure 3.

3.4 FTLSH

The FTLSH protocol is used to establish a secure session. In order to understand the design idea of FTLSH, let’s first look at the three phases of the TLSv1.3 handshake:

Key Exchange: Establish shared key data and select the cryptographic parameters. After that, all data is encrypted.

Server Parameters: Establish other handshake parameters.

Authentication: Authenticate the server (and, optionally, the client) and provide key confirmation and integrity.

To achieve the disguise purpose, we only need to focus on the first phase. Take the usual parameters used by TLSv1.3 to establish an HTTPS connection and insert the parameters required for FTLSH into them. Specifically, the data obtained by using Firefox 71 to access a well-known website that fully supports TLSv1.3 is used as a template. The methods of disguise are shown in Figure 2, corresponding to ③ in Figure 3, and are described as follows:

  1. The interaction between the client and the server begins with four packets. The client first sends Client Hello. The server then responds with Server Hello, followed by Change Cipher Spec and Application Data. The client then sends Change Cipher Spec and Application Data. Thereafter, both parties send Application Data unless a serious error occurs in the connection.

  2. Throughout the handshake, if everything is normal, there should be no alerts. All four fatal alerts may be triggered if a very serious network error is encountered or an attacker inserts malicious traffic. The correspondence between errors and alerts is subject to RFC 8446.

  3. Unencrypted Client Hello and Server Hello messages are constructed by template filling, and from the Application Data onwards, the communication is encrypted.

  4. Change Cipher Spec and Alert are fixed content that can be sent directly when appropriate, with no special processing required before sending.

  5. Both Client Hello and Server Hello have 64 bytes of space for Random and Session ID, which can be used to transmit a HMAC-SHA256((timestamp, all remaining data), PSK) value as the Session ID. Send a 32-byte random value to be padded to Random as a challenge value. The acceptable margin of error for timestamp is [-1, 0].

  6. Both parties use the Key Exchange field to perform an ECDHE key exchange using X25519.

  7. The server and client perform authentication separately. For the server, authenticate SessionID==HMAC((timestamp, all remaining data), PSK); for the client, authenticate First_AD_Payload==HMAC(all_remain_data, PSK) && response==HMAC(challenge, session_key). If both authentications succeed, then the connection is established.

  8. For the first message, if the client fails to authenticate, it will disconnect directly, with the aim of identifying whether there is a malicious server. If the server fails to authenticate, it forwards the packet to nginx and sends back the response from nginx, with the aim of identifying whether there is a malicious client. If the server fails to authenticate and fails to connect to the back-end disguise service, it will send a Close Notify Alert to the opposite end and close the TCP connection immediately (configuring an available disguise service is mandatory).

  9. For the second message, if the client fails to authenticate, it will directly disconnect (indicating that there is a server that is trying to attack but the PSK has not yet been leaked), and if the server fails to authenticate (indicating that the first message has been successfully replayed), it will send BadRecordMAC and immediately close the TCP connection.

A special note should be made about the size of the handshake packet:

  1. According to RFC8446 requirements, a Client Hello less than 512 bytes should be padded to 512 bytes, so in cases where the length of the domain name in this template is not more than 160 characters, the Client Hello is fixed at 512 bytes in length. In order to simplify the protocol and increase speed, extra-long domain names longer than 160 characters are not supported.

  2. If the same HTTPS server always responds with the same Server Hello to the same Client Hello, then the Server Hello is also of fixed length.

  3. Server Hello will be followed by an Application Data packet of 32 bytes in length for completing the subsequent TLS handshake.

  4. The length of these packets remains unchanged, which is in line with the requirements of the standard. Not all handshake processes of websites using TLSv1.3 follow this method, but this method is recommended and popular.

3.5 FTLST

The FTLST protocol is used to provide forwarding and encrypted traffic shunt features. There are two paths into the FTLST phase. The first path is to detect the first handshake packet as illegal, in which case the traffic is directly forwarded to the disguise exit without any processing; the second path is to provide the service normally after correct interaction.

The second path can be specifically divided into two cases: first, the client and the server independently run the traffic shunt algorithm, determining whether the current proxy connection is a TLS connection by recording whether the first two packets after the handshake are the Client Hello and Server Hello of the TLS protocol. If it is a TLS connection, for Application Data packets in the subsequent communication, the FTLSocks client forwards them directly, and the server determines whether the current packet is Application Data processed by the client or raw Application Data through trial decryption. For other types of packets, the client does not need to check and directly encrypts them before embedding them in the Application Data field for forwarding. If it is not a TLS connection, the client directly encrypts all subsequent packets, and the server decrypts them and forwards them to the proxy target.

In summary, three different forwarding policies may be adopted depending on the state of the previous connection, based on two main ideas: no multiple encryption, and no actual processing of the TLS protocol. This corresponds to ④–⑨ in Figure 3.

No multiple encryption: The traffic shunt policy is shown in Figure 4. “No multiple encryption” reduces the computational cost at both ends, thus providing higher traffic processing capacity with limited computing power. The process of handling non-encrypted traffic is shown in Figure 5, and the process of handling encrypted traffic is shown in Figure 6.

A flowchart. An annotation on the side (beige rectangle with folded-over corner) reads “Client Hello; Correct Header; Server Hello: Correct Header; Consider as TLS channel if both are satisfied; No encryption for Application Data in TLS channel, send it directly; Any other packet will be packed ad Application Data”. The initial node is “socks handshake”. It points to a process node “Begin Transfer”. Control then passes to a battery of decision nodes connected in series: “is Client Hello” (additionally annotated with a speech balloon reading “Encrypted”), “is Server Hellois” (with a similar “Encrypted” speech balloon), and “AD packet” (no speech balloon). The “Yes” edges go to the next decision node, and then finally to a predefined process node (rectangle with double-struck vertical edges) labeled “Send it directly”. All the “No” edges go to a predefined process node labeled “Pack as Application Data”, which has an “Encrypted” speech ballon attached to it.
Figure 4: Traffic shunt flowchart
A communication diagram. Four actors are represented by boxes arranged left to right: “Socks Server”, “FTLS Client”, “FTLS Server”, and “Target”. Between Socks Server and FTLS Client, there are two arrows (one pointing right and one pointing left) labeled “Plain”. Similarly, between FTLS Client and FTLS Server, there are arrows “Enc into AD”; and between FTLS Server and Target, arrows labeled “Plain”. The FTLS Client box contains a heavy rightward arrow “ENC” and a heavy leftward arrow “DEC”; the FTLS Server box has “DEC” pointing right and “ENC” pointing left.
Figure 5: Schematic diagram of non-encrypted communication
A communication diagram with the same basic layout as Figure 5: “Socks Server”, “FTLS Client”, “FTLS Server”, and “Target” arranged from left to right. Between Socks Server and FTLS Client, there are two arrows (one pointing right and one pointing left) labeled “Cipher”. Similarly, between FTLS Client and FTLS Server, there are arrows “Origin”; and between FTLS Server and Target, arrows labeled “Cipher”. The FTLS Client and FTLS servers each contain a heavy bidirectional arrow “DIRECT IF AD”.
Figure 6: Schematic diagram of encrypted communication

No actual processing of TLS: “No actual processing of TLS” offers the possibility of lower response time while reducing server computation. In contrast to the complex processing of the full TLS protocol, FTLST only involves an encrypted traffic shunt, packet sequence number incrementation, and five types of alert messages. The sequence number is incremented by 1 for each successful encryption, and a 12-byte random value is transmitted along with the encrypted data as a nonce. This is discussed in more detail in Sections 4.2.3 and 4.4.1.

4 Security analysis

The following sections analyze the security of the FTLS protocol in the face of eavesdropping attacks (obtaining confidential information by monitoring the communication process), man-in-the-middle attacks (deceiving both parties into believing that the attacker is the other party), replay attacks (re-sending data captured during eavesdropping to gain trust or cause unauthorized operations), data tampering (modifying intercepted communication data before sending it to the original receiver), and active attacks (where the attacker pretends to be one party of the communication and interacts with the other party, then uses information obtained during the interaction to mount further attacks), and discuss the security in the case of key exposures. Finally, a brief outline is given regarding the possibility of attackers using traffic analysis techniques (analyzing the length, content, and characteristics of network traffic to identify specific types of network activities) to identify disguise attempts and countermeasures.

4.1 Eavesdropping and man-in-the-middle

The master key is 256 bits in length and is generated using a CSPRNG. The use of the ECDHE mechanism for key exchange and AEAD for encryption ensures the resistance to eavesdropping attacks.

The handshake process includes a bidirectional challenge–response where both parties must simultaneously use the PSK and the session key negotiated for that session to prove their identities. For the server, the PSK is used to generate an HMAC, and the session key is used for the first time to compute the response; for the client, the PSK is used to generate an HMAC, and the session key is used for the first time to request data encryption. Thus, a man-in-the-middle attack is also infeasible.

4.2 Replay

Generally speaking, in a normal, real TLS handshake (neither 0-RTT nor resumed session), Client Hello and Server Hello do not contain confidential information, and even if replayed, they cannot successfully establish a TLS session. Therefore, there is no need to worry about replay attacks. However, since the FTLS protocol treats these two packets as a source of authentication information, the impact of replay must be fully considered.

4.2.1 Client Hello

The HMAC-SHA256 value of the timestamp is used as a basic measure against replay, with a validity period of 2 s. It can be seen that a Client Hello may be successfully replayed. However, since the key exchange is completed immediately before the second round of interaction that follows, subsequent sessions require authentication and encryption using the session key, making it impossible for an attacker conducting a replay to calculate the correct session key. Therefore, it is impossible to resend legitimate packets. Thus, the replay cannot succeed, and the second packet will surely be erroneous and will be rejected by a BadRecordMAC alert message. The ability for an attacker to replay a Client Hello does not pose a real risk, because even if the replay is successful, the attacker will not get any different feedback than a normal server.

4.2.2 Server Hello

Due to the inherent nature of the Server Hello, it cannot be replayed. The Server Hello contains an HMAC value to protect its integrity, and the padded Session ID must match the corresponding Client Hello. It is impossible for an attacker to perform a replay while maintaining consistency in the Session ID without compromising integrity.

4.2.3 Any subsequent packets

Each subsequent packet involves AEAD encryption. The key to preventing replay lies in the use of a nonce and a sequence number. The nonce is a 12-byte random value transmitted with the encrypted payload; the sequence number is not transmitted but is locally incremented as Additional Data in AEAD, incrementing by one with each successful encryption or decryption. Replay, truncation, tempering, or reordering will result in AEAD authentication failure (BadRecordMAC alert) and interrupt the connection, requiring a re-handshake to restore communication.

4.3 Tampering

Similar to what is described in the previous section, there is no need to worry about tampering with the Client Hello and Server Hello in TLS, but both must be integrity-protected in the FTLS protocol in order to simplify the simulation of the TLS handshake process and to avoid dealing with anomalous handshakes. In the first round of interaction before key exchange, the two packets use HMAC for integrity protection, and after the key exchange is complete, AEAD is used for integrity protection. The unencrypted data portion is a fixed value and checked. In summary, tampering can be detected immediately.

If authentication fails, it indicates that the other party has not followed the FTLS protocol, and the handshake packets received will be processed by the nginx process to ensure that there is a correct response to anomalous handshake packets. As a result, attackers cannot distinguish whether the current service is FTLS by constructing special handshake packets.

4.4 Active attacks

From the time a connection is initiated until the handshake is completed, many steps may encounter active attacks, and each is analyzed separately below. Due to the inherent nature of the protocol, if it is detected as different from a regular TLS service, it losses concealment, which can be regarded as “not achieving the design goal”.

4.4.1 Active probing

For reasons that are well known, let’s start with countermeasures against active probing. Active probing refers to the process and means by which an attacker disguises themself as a client and actively sends probes to a suspicious server and determines whether the suspicious service is a specific target through the response.

Forwarding of failure: It targets the handshake process, aiming to prevent a prober or any visitor without the correct access credentials from detecting the difference between the active FTLS service and the TLS service by direct access.

Specifically, if the first handshake packet (Client Hello) fails to authenticate, it is forwarded directly to the disguise service without any processing; if the first handshake packet is successfully authenticated (it must have been successfully replayed within a valid time), then authentication is performed on the second handshake packet, but it is impossible for any illegal peers to pass authentication due to their inability to compute the correct session key. Once authentication fails, a BadRecordMAC message is sent and the TCP connection is closed, as per the TLS protocol. Closing the connection here does not sacrifice concealment, as this is how any HTTPS service handles non-compliant packets when it receives them.

Detect and close attacked connections: In order not to appear different from a regular TLS service when under active attack, it is necessary to give an appropriate alert message when a connection error occurs and to handle correctly whether to continue the connection or to disconnect it. According to the previous analysis, it is impossible for an attacker to directly interact with the proxy service in a legitimate way with controllable plaintext, so inserting an attack probe will inevitably lead to a fatal error. There are five possible scenarios, namely, tampering with the handshake, tampering with the plaintext protocol header, tampering with the ciphertext, replaying the handshake, replaying Application Data, and reordering. The corresponding alert message can only be one of the four fatal error alerts that cause TLS connection termination: Illegal_parameter, BadRecordMAC, UnexpectedMessage, and RecordOverflow. To counteract such probing, simply issue an alert and close the connection appropriately, depending on when and how the malicious attack probe was inserted.

Close the connection upon receiving a fatal error alert: While an attacker cannot inject valid ciphertext, it is possible to inject a fatal error alert message in plaintext. According to the RFC, the connection should be closed when a fatal error alert is received, so it is sufficient to follow the protocol and close the connection at this point.

4.4.2 Rogue client

Assuming the attacker does not get the long-term key, when an attacker disguises as a client to communicate with the server, they will be unable to correctly construct the first packet due to the lack of a valid pre-shared long-term key, and thus the disguise attempt will fail. After handshake authentication fails, the attacker is directed to the disguise service and cannot obtain any valid information.

4.4.3 Illegal server

When an attacker disguises as a server to communicate with a client, due to the lack of a valid pre-shared long-term key, they cannot correctly calculate the MAC value of the Server Hello, and thus the disguise attempt will fail. After failure, the client will not continue the handshake but will send UserCanceled followed by CloseNotify, and then close the connection. The attacker gets a valid Client Hello but cannot replay it or obtain the long-term key, rendering further attacks useless.

The client's behavior here differs from that of a normal client and may be detected as anomalous. however, normal clients also occasionally close the connection directly. A period of silence may help to avoid exposing the presence of the client, but given the dynamic nature of the client’s IP and the huge impact on service availability, this active silence feature has not been implemented.

4.4.4 Transmission phase

The transmission phase is essentially a combination of TLS header and AEAD encrypted data. An UnexpectedMessage alert is sent when the TLS header is detected as illegal. If a huge data frame is inserted or the length field is modified to be extra-long, a RecordOverflow alert is sent. The tampering and replay scenarios have been covered earlier and will not be repeated.

4.5 Key exposures

While defenders may prioritize connection availability over the confidentiality of communications, there are many scenarios in which it is attractive to provide additional protection against varying degrees of key exposures.

The following describes additional protection measures for proxy users and proxy data in two different scenarios.

4.5.1 Session key exposures

Due to software flaws, memory content may be leaked, leading to exposure of the current session key. The session key is negotiated via ECDH and is only used for the current connection, thus providing forward security. At this point there can be no threat to any other sessions, previous sessions, or the master key.

4.5.2 Local intrusion

In the event of an intrusion, confidential information contained in locally stored configuration files may be compromised, leading to exposure of the long-term key. Five measures are used to deal with this scenario:

  1. Enforce the use of longer and more complex master passwords when creating a configuration file to prevent brute-force attacks.

  2. Use the Argon2id algorithm with parameters time=100, memory=32*1024, threads=4, and keylen=32 to derive the master key from the master password to encrypt the configuration file. Add a 32-byte salt in the key derivation process. A key is randomly generated each time a configuration file is generated and appended to the end of the configuration file to prevent brute-force attacks. The recommended parameters for the Argon2id algorithm include memory=64*1024. We use half of the recommended parameter but expand the time parameter to 100 times the recommended value, so the overall security is not reduced. This does not negatively affect performance because it only runs once at startup. Reference time consumed: 926.12ms on an Intel Core i5-8300H.

  3. Do not accept the master password from command line parameters, but enter it manually in subsequent steps without echo, to prevent password leakage in command history.

  4. Overwrite the master password in memory immediately after loading the configuration file.

  5. Configuration file generation is divided into two parts. The client configuration file does not contain server-side parameters (disguise exit address, time-out period, coroutine pool size, etc.). The two configuration files have the same master password and size but different salt values.

  6. Before encryption, the configuration file is padded to a fixed length to ensure that no information about the configuration is leaked due to file size, such as domain name length.

In conclusion, even if a malicious user gains local access to the client or server, it is highly unlikely that they will obtain the long-term key or use information left by this tool to help locate the server or decrypt communication content.

4.6 Traffic analysis

The time-out period, connection duration, and packet length distribution are the main components of traffic analysis. The time-out period is configurable, while the latter two depend on the connection passing through the proxy. The default time-out period of 60 s is a common choice on the Internet but can be modified as needed.

The latter two factors, without additional processing (which is exactly what we chose to do, as more aggressive concealment would sacrifice a lot of performance) are a direct reflection of the proxy service’s usage and can be compared to the traffic that our claimed service might have provided to determine whether it is suspicious enough. But this does not make FTLSocks more suspicious compared to existing stable tools, and the degree of suspicion can be reduced by choosing a more reasonable disguise service. In fact, even padding with random bytes does not make the packet length distribution look more normal.

In summary, security analysis can be broadly categorized into four scenarios: the first packet is detected as an attack; the first packet is successfully replayed, and the second step is reasonably rejected or timed out and disconnected; passive observation shows no difference; and active probing is reasonably responded to and disconnected. The point of unobservability is that: thorough integrity protection from start to finish ensures that interactions take place within the agreed-upon framework without having to think too much about the details, and the only thing that has to be taken care of is the proxy’s functionality; anything outside the framework is handled by the real public implementation of TLS.

5 Comparison and analysis

The following sections analyze the advantages of the FTLS protocol in comparison with various popular software in terms of security and performance. Forward security, performance, resistance to single point of failure and ease of deployment are the outstanding advantages of FTLSocks.

5.1 Security comparison

The following is an analytical comparison between several popular tools using the same policy and other representative tools.

Table 1: Feature comparison
Distributed Symmetry model Flexibility Resistance to active probing Forward security Deployment complexity Efficiency Single point of failure
V2Ray No Mix High Mix Mix Medium-high Low Mitigable
Trojan No Mix Low Yes Yes Medium Medium-low Yes
Shadowsocks No Yes N/A Yes No Low High Yes
Tor with Obfs4 Yes No Low Yes Yes Medium Extremely low No
Lantern Yes No Low No Unknown High Medium No
FTLSocks No Mix Medium Yes Yes Medium-low High No

V2Ray: It is a general-purpose toolkit and a generic network communication framework that provides diverse protocol support and functional features and is highly customizable. By modifying the configuration files, a wide range of functions can be implemented, and external traffic can take on various forms. It also has built-in routing and DNS, and with minor modifications, most new obfuscation schemes can be implemented. The officially released core toolkit is implemented in Go, but many third-party implementations exist as well. The downside is the large extra overhead when customizing multi-layer communication protocols, sometimes with double or triple encryption. Overall, V2Ray takes an all-encompassing approach, allowing new schemes to maximally benefit from common modules such as built-in DNS, connection multiplexing, and traffic shunting engines, thus ensuring that new schemes can immediately provide a good user experience.

Trojan: A tool designed with the sole purpose of evading cyber attacks by sending a Socks-like proxy protocol over HTTPS traffic. Essentially, it is a reverse proxy server that distributes traffic based on the rule of “whether received packets can be parsed as legitimate protocols”. The officially released tool is implemented in C++, with OpenSSL 1.1.1d as the HTTPS implementation. Its drawbacks include large overhead for actually establishing TLS connections, slow new connection response times, multiple encryption of large amounts of HTTPS traffic, and an inability to be placed behind a reverse proxy for load balancing. Running a complete website locally to provide basic countermeasures against active detection would be a significant burden on the server, hindering large-scale deployment. Using a static website or a dynamic website whose content is not updated renders the failing forward mechanism meaningless, as fixed content makes it easy to be suspected and judged. It also lacks a convenient and secure configuration method; as it can only be edited directly in a plaintext JSON file. Its current implementation (version 1.14.0) has several flaws: no good default password policy, slow handling of illegitimate requests, weak multi-core computing power, and incomplete reverse proxy functionality (notably, it cannot reverse proxy HTTPS). Trojan ditched the bloated architecture of V2Ray, took the TLS nesting mode out separately, and modified the first step of the TLS handshake process. Combined with first-packet authentication and forwarding of failures, it ended up with a more stealthy and efficient proxy model. But at the same time, it also lost reverse proxying, multi-level proxying, load balancing, single-point-of-failure resistance, and many other features and network architecture flexibility.

Shadowsocks: A simple, single-purpose, and fast encrypted Socks proxy. It transmits a Socks-like protocol but splits the traditional Socks server into a client and a server, with encryption between the two ends. The server attempts decryption upon receiving data; if the decrypted protocol parses successfully, the function is executed; otherwise, nothing happens. In addition, it can also shunt traffic based on the address to be accessed before forwarding. It supports obfuscation plugins for preprocessing traffic to add additional features, but to date these obfuscations have been only marginally useful due to the way the plugins work. Shadowsocks has no official version, and currently popular implementations include shdowsocks-libev implemented in C, shadowsocks-go2 implemented in Go, and the Python implementation that remains at version 2.8.2 (08/10/2015). The various implementations have different functional features, but thanks to Shadowsocks’s design philosophy, they always maintain fast processing capabilities and traffic with no distinctive features. Its advantage lies in its simplicity and speed, while its disadvantage is that it is overly simplistic and crude, which can easily raise suspicion. In Section 2.4, we introduced the weaknesses and detection methods of such proxies, but it has to be admitted that this method still has a certain survival space and performance advantages unmatched by other methods.

Tor with Obfs4: This is a method of secretly connecting to Tor bridges, a way to connect to the Tor network. Here we focus on Obfs4. Like Shadowsocks, it is an anti–active probing proxy, but specifically for connecting to Tor bridges. Many defense methods ultimately connect to the Tor network and rely on the Tor network to provide the remaining services, taking full advantage of the anonymity of the Tor network. Everything has its pros and cons, and anonymity and efficiency are an pair of irreconcilable contradictions. The way the Tor network works makes it very difficult to achieve higher performance. It might be a better way to get flexibly configurable anonymity, security features, and performance through a tool that is not bloated and can be used in different ways.

Lantern: A cross-platform, high-availability commercial proxy tool. Lantern’s protocol, Lampshade, and the way it operates are the most complex among the protocols discussed. It combines centralized and distributed proxy technologies, with a central server deployed by the maintainer providing primary services, while users providing secondary services as P2P nodes. To improve performance and availability, Lantern sacrifices anonymity, making its service addresses easier for attackers to discover[12]. Due to its deployment complexity, commercial nature, and extremely limited configurability, we do not consider Lantern a promising or trustworthy tool.

FTLSocks: An implementation of the FTLS protocol, developed by the authors of this paper. It was proposed in order to provide a secure, fast, lightweight, configurable, and feature-selectable proxy tool. The initial goal was to reduce the burden imposed by the Trojan implementation and to provide similar defensive features. The idea behind the disguise is a disguised TLS tunnel proxy, not a complete TLS implementation. It implements only half of the fake handshake process and handles some alerts. By combining the first-packet authentication mechanism and the forwarding-failures mechanism, connections that fail authentication are handled by nginx, thereby saving the implementation of TLS-related processes and fully utilizing nginx’s high-performance implementation and well-developed reverse proxy functionality. The protocol is designed to achieve features such as traffic shunting, load balancing, and single-point-of-failure resistance by being deployed in various ways or flexibly cooperating with third-party software. On its own, it provides basic security features such as forward security and confidentiality, while ensuring high performance. Furthermore, it provides a better default key policy and a configuration wizard.

5.2 Performance comparison

The main purpose of designing the new method and creating this tool is to improve performance and avoid unnecessary consumption of computational resources, so a performance test comparing it with popular tools is necessary.

5.2.1 Test method

The test was conducted using siege. Files of three sizes: large, medium, and small (147MB, 22MB, and 74KB), and four levels of concurrency: low, medium-low, medium, and high (2, 5, 20, and 150 concurrent threads, representing light use by a single user, heavy use by a single user, light use by multiple users, and heavy use by multiple users, respectively) were used for a one-minute stress test with a random 0.5s delay (siege’s default value, to better simulate the distribution of real-world visits).

AES-256-GCM is selected for all encryption operations, and TLS is selected if the protection mode is optional.

5.2.2 Test environment

All test data and programs were saved on tmpfs to ensure that local I/O would not become a bottleneck for the test. To ensure that the hardware specifications are consistent and that ambient temperature and humidity do not cause performance drift, to avoid inaccurate measurements, the tests were conducted on the loopback network of a cloud host, specifically Alibaba Cloud’s compute-intensive ic5 / ecs.ic5.2xlarge (8vCPU 8GiB).

The hardware environment for the test was IntelXeon (Skylake) Platinum 8163 8vCPU clocked at 2.5 GHz with a turbo frequency of 2.7 GHz, 8 GB of RAM, an intranet bandwidth of 2.5 Gbps, and an intranet packet rate of 800,000 PPS.

The software environment was CentOS 8.1 x86_64 Kernel 4.18.0, nginx/1.17.10.

In this setup, nginx was configured according to the Mozilla Guideline v5.4 recommended medium configuration, with worker_processes=2, worker_connections=1024, keepalive_timeout=60, sendfile=on.

5.2.3 Test results

The software tested included FTLSocks/0.1.0, shadowsocks-go2/0.1.0, shadowsocks-libev/3.3.4.0, Trojan/1.14.0, and V2Ray/4.23.1. Test items included response time (s), throughput (MB/s), maximum request completion time (s), and minimum request completion time (s).

To save space and make the charts more representative, all test results were selected for the throughput, large file test results for the maximum/minimum request time, and medium file test results for the response time. Both the encryption and non-encryption scenarios were presented, except for the request completion time, for which only the encryption scenario was presented because the test results for the encryption and non-encryption scenarios were extremely similar.

An overview of all performance test result charts shows that the general order of performance from best to worst is as follows: direct access, FTLSocks, shadowsocks-go2, shadowsocks-libev, V2Ray with TLS, and Trojan. Contrary to expectations, Trojan’s performance was lower than V2Ray under high concurrency, but higher than V2Ray under low concurrency.

In the throughput test, small files performed poorly under high concurrency, medium files performed normally, and there was a slight decline with large files; when the test pressure was low, the differences between programs were minimal; but as the file size increased, a significant performance gap could be observed even under low concurrency pressure. In either case, FTLSocks and shadowsocks-go2 were the top-performing proxy programs, well ahead of the third-place finisher.

In the tests for the maximum/minimum request completion time and response time, the trends shown in the six charts were basically consistent, with the FTLS protocol demonstrating an advantage in all cases. A horizontal comparison shows that the proxy process of FTLSocks has the least impact on the response time and the least latency time under high concurrency.

Comparing HTTP and HTTPS communication, it can be obsered that HTTPS communication exhibits lower performance in all scenarios. However, the relative performance relationships among the tested objects remain almost unchanged, with FTLSocks consistently maintaining a significant advantage. Although the HTTPS traffic forwarding process of FTLSocks reduces the demand for encryption and decryption, it increases the need for selection and judgment, resulting in a sizable but less than expected performance increase.

This and the other sub-figures of Figure 7 are line charts with tables underneath. The line chart shows the same information as the table. In each chart, there are four lines: 2, 5, 20, 150, which correspond to the four rows of the table. The exception are the charts for the “big” charts, which have only three lines: 2, 5, 20. The x-axis is unlabeled, but the nodes on the graph correspond to the six columns of the table below: original, FTLSocks, ssgo2, sslibev, V2Ray, Trojan. (A line chart is a bad way to represent this data: the shape of each line depends on the ordering of the table columns, which is arbitrary.) The alt text of these figures will report the values in the tables, reading down the columns from left to right. The title of this sub-figure is “throughput on HTTP small”. original: 0.56, 1.4, 5.71, 42.64. FTLSocks: 0.56, 1.46, 5.66, 41.92. ssgo2: 0.55, 1.4, 5.68, 42.21. sslibev: 0.55, 1.44, 5.68, 42.16. V2Ray: 0.55, 1.4, 5.56, 38.53. Trojan: 0.57, 1.44, 5.65, 34.32.The title of this sub-figure is “throughput on HTTPS small”. original: 0.55, 1.41, 5.66, 42.13. FTLSocks: 0.54, 1.39, 5.6, 39.74. ssgo2: 0.58, 1.44, 5.62, 39.97. sslibev: 0.55, 1.39, 5.54, 40.93. V2Ray: 0.56, 1.35, 5.47, 30.65. Trojan: 0.54, 1.37, 5.55, 29.35.

The title of this sub-figure is “throughput on HTTP middle”. original: 153.79, 378.67, 1508.15, 2723.96. FTLSocks: 148.71, 365.98, 1159.03, 1258.19. ssgo2: 149.8, 373.59, 1202.55, 1315.7. sslibev: 154.15, 347.84, 523.03, 423.35. V2Ray: 117.18, 281.05, 504.25, 387.74. Trojan: 133.14, 274.57, 310.84, 252.13.The title of this sub-figure is “throughput on HTTPS middle”. original: 148.74, 375.41, 1174.82, 1222.9. FTLSocks: 145.11, 355.4, 1021.93, 1050.05. ssgo2: 147.65, 351.47, 993.3, 1008.51. sslibev: 149.82, 324.99, 515.41, 398.69. V2Ray: 122.25, 272.03, 473.54, 366.16. Trojan: 128.42, 269.86, 307.58, 218.39.

The title of this sub-figure is “throughput on HTTP big”. There are only three rows in the table and three lines in the chart, not four: 2, 5, 20. original: 621.2, 1481.08, 2768.17. FTLSocks: 551.19, 1096.82, 1290.88. ssgo2: 562.97, 1148.7, 1339.11. sslibev: 494.02, 557.24, 488.81. V2Ray: 289.02, 491.25, 488.81. Trojan: 330.11, 354.38, 293.28.The title of this sub-figure is “throughput on HTTPS big”. There are only three rows in the table and three lines in the chart, not four: original: 581.97, 1126.7, 1307.56. FTLSocks: 525.82, 950.57, 1082.71. ssgo2: 518.57, 948.13, 1062.98. sslibev: 450.0, 537.69, 488.81. V2Ray: 283.84, 471.62, 457.11. Trojan: 337.45, 344.61, 293.28.

The title of this sub-figure is “longest_transaction on HTTP middle”. original: 0.05, 0.06, 0.19, 1.7. FTLSocks: 0.06, 0.1, 0.37, 5.48. ssgo2: 0.06, 0.09, 0.36, 32.9. sslibev: 0.09, 0.19, 0.87, 7.64. V2Ray: 0.15, 0.22, 0.9, 18.73. Trojan: 0.14, 0.35, 1.41, 12.85.The title of this sub-figure is “shortest_transaction on HTTPS middle”. original: 0.03, 0.03, 0.03, 0.13. FTLSocks: 0.03, 0.03, 0.05, 0.41. ssgo2: 0.03, 0.03, 0.05, 0.13. sslibev: 0.03, 0.03, 0.08, 4.35. V2Ray: 0.08, 0.08, 0.21, 1.69. Trojan: 0.05, 0.05, 0.44, 9.99.

The title of this sub-figure is “response_time on HTTP big”. There are only three rows in the table and three lines in the chart, not four: original: 0.21, 0.25, 0.79. FTLSocks: 0.28, 0.42, 1.99. ssgo2: 0.26, 0.39, 1.9. sslibev: 0.37, 1.05, 5.51. V2Ray: 0.76, 1.24, 5.5. Trojan: 0.63, 1.77, 8.73.The title of this sub-figure is “response_time on HTTPS middle”. original: 0.04, 0.04, 0.12, 2.37. FTLSocks: 0.05, 0.05, 0.17, 2.79. ssgo2: 0.05, 0.05, 0.18, 2.75. sslibev: 0.05, 0.08, 0.6, 7.34. V2Ray: 0.12, 0.15, 0.66, 8.02. Trojan: 0.08, 0.15, 1.16, 12.17.

Figure 7: Performance test groups

By comparing different concurrency scenarios, it can be found that when the number of concurrent threads reaches 150, the throughput of Shadowsocks-libev, V2Ray, and Trojan all decreases with the increasing number of concurrent threads, indicating that they have reached a concurrency performance bottleneck. On the other hand, the throughput of FTLSocks and shadowsocks-go2 still increases slightly, with FTLSocks demonstrating a slightly larger increase in throughput, which fully proves the performance advantage of the FTLS protocol.

5.2.4 Comparison of system resource usage

Here we mainly compare client-side data, as shown in Table 2. The server side generally differs little from the client side and has relatively sufficient computational and space resources, so for the sake of brevity, it is omitted from Table 2.

Table 2: System Resource Consumption
FTLSocks shadowsocks-go2 shadowsocks-libev V2Ray Trojan
Size (MB) 3.1 1.1 5.0 9.1 1.7
Idle memory usage (MB) 34.4 3.2 1.6 55.5 4.1
Memory usage after stress testing (MB) 54.6 9.6 2.0 78.1 17.0
CPU consumption per unit of data in HTTP stress testing 21.73 19.68 22.29 139.43 18.59
CPU consumption per unit of data in HTTPS stress testing 23.23 25.32 22.32 143.64 19.81
(Note: FTLS only consumes 2.4M except for the 32M consumed by Argon2id to calculate the derived key. The data listed here are all client-side data, and the server-side differences are very small (less than 0.01 in absolute terms, except for the HTTP case where the FTLS client has an additional 0.33 CPU overhead).)

Detailed information on CPU resource consumption is shown in Tables 34. File sizes are compared by removing debugging information and symbol tables and then compressing with UPX. All files except for single executable files are calculated based on the size after being compressed in 7z format. For static links, only the executable file itself is calculated; for dynamic links, the dynamic link library file is calculated as well.

Table 3: CPU consumption in the HTTP scenario
Category Client Server Throughput
User space Kernel space Total utilization User space Kernel space Total utilization
FTLSocks 9.78 11.95 21.73 8.98 12.41 21.40 1.11
shadowsocks-go2 8.32 11.36 19.68 8.32 11.36 19.68 1.15
shadowsocks-libev 11.44 10.85 22.29 11.44 10.85 22.29 0.49
V2Ray 93.50 45.93 139.43 93.50 45.93 139.43 0.30
Trojan 52.70 12.00 18.59 6.59 12.00 18.59 0.47
Table 4: CPU consumption in the HTTPS scenario
Category Client Server Throughput
User space Kernel space Total utilization User space Kernel space Total utilization
FTLSocks 10.62 12.61 23.23 10.62 12.61 23.23 0.92
shadowsocks-go2 10.70 14.62 25.32 10.70 14.62 25.32 0.89
shadowsocks-libev 11.46 10.86 22.32 11.46 10.86 22.32 0.49
V2Ray 96.32 47.31 143.64 96.32 47.31 143.64 0.29
Trojan 7.02 12.79 19.81 7.02 12.79 19.81 0.44

The method of comparing memory usage is to compare the startup state with the default client configuration and after the stress test. The stress test method used for comparison is a 1-minute HTTP stress test with 20 threads and medium-sized files. This method was chosen because, based on performance test data, it best demonstrated optimal performance under these conditions.

The method of comparing CPU utilization is as follows: first, perform a 1-minute stress test with 20 threads and medium-sized files, then calculate the average CPU utilization, and finally compare the CPU resources consumed per unit of data in the cases of HTTP and HTTPS. CPU utilization data is collected using pidstat, and throughput data is collected using siege, with the data being independent of the performance test phase. It is calculated by dividing the average CPU utilization (%) by the throughput (GB/s). Considering that larger throughput will inevitably consume more computational resources for encryption and decryption, this calculation method is quite fair.

As can be seen from the table, FTLSocks does not have an advantage in terms of memory footprint and file size over shadowsocks-lib and Trojan, which are written in C/C, or shadowsocks-go2, which has relatively simple logic. As can be seen from the CPU utilization, FTLSocks requires relatively few resources. But on the other hand, either one is quite small and small enough to work well and perform well on resource-constrained embedded devices.

It can be seen that the symmetric dimensions on the radar charts are not much different except for the HTTP scenario for Trojan, indicating that the CPU resource consumption between the client and server is almost symmetric. Three points need to be specifically pointed out: the FTLSocks client consumes slightly more than the server; V2Ray consumes the most, far exceeding the other ones. The client and the server are generally on par, but Trojan stands out in terms of CPU usage by user space in the HTTP scenario. The details are shown in Tables 34 and Figure 8.

The title of this sub-figure is “CPU Consume on HTTP”. It is a radar chart that depicts 6 variables, Client Rate user, Client Rate system, Client Rate all, Server Rate user, Server Rate system, Server Rate all; across 5 conditions: FTLSocks, shadowsocks-go2, shadowsocks-libev, V2Ray, Trojan. The scale goes from 0.00 to 140.00 in increments of 20.00. All the conditions are pretty similar, clustered near the center at values of about 20.00 for all variables. The exceptions are (Trojan, Client Rate user), which is at about 50.00; and V2Ray, which has higher values in general, ranging from 40.00 to 140.00.The title of this sub-figure is “CPU Consume on HTTPS”. It is a radar chart that depicts 6 variables, Client Rate user, Client Rate system, Client Rate all, Server Rate user, Server Rate system, Server Rate all; across 5 conditions: FTLSocks, shadowsocks-go2, shadowsocks-libev, V2Ray, Trojan. The scale goes from 0.00 to 140.00 in increments of 20.00. All the conditions are pretty similar, clustered near the center at values of about 20.00 for all variables. The exception is V2Ray, which has higher values in general, ranging from 40.00 to 140.00.
Figure 8: CPU consumption radar chart

In addition, it is worth mentioning that even when testing repeatedly with large amounts of data and high concurrency, FTLSocks does not consume more memory than it does after only a 1-minute stress test (the margin of error in multiple tests is approximately 1M).

5.2.5 Analysis

Let’s analyze the specific reasons by combining the source code and performance test data. First, it should be noted that the master key used by this tool for exporting encrypted and decrypted configuration files is Argon2id. This algorithm has a memory consumption mechanism to resist attacks from ASICs, GPUs, and HPC clusters. In this implementation, 32M was chosen as the memory parameter; therefore, it can be considered that the actual memory consumed in idle state is 2.4M. This part of the space will be managed by the go runtime as heap space after the key derivation is completed.

Shadowsocks has a simple structure and is a 0-RTT protocol with a pre-shared key, so it is only natural that its performance exceeds that of this tool in all aspects. This is merely provided as a reference for the performance standard of a popular tool. Therefore, the focus of the analysis is to compare the TLS mode of Trojan and V2Ray with this tool.

Compared with V2Ray, it can be found that this tool has a lower CPU utilization but better performance. Although the memory utilization is relatively high, it is still within acceptable limits. By analyzing V2Ray’s TLS disguise implementation, it can be found that, in this scenario, V2Ray performs two encryptions, uses TLS in its entirety, and does a lot of extra work including routing, DNS, connection multiplexing, etc., to maintain versatility. That’s why V2Ray consumes more CPU resources.

While Trojan has an advantage in memory and CPU utilization, it also suffers a significant decrease in performance. From the performance test data, it can be found that Trojan's performance is close to or slightly higher than other comparison objects only when the amount of concurrency is low.

This also shows the reason why most of the performance tests circulating on the Internet conclude that Trojan has higher performance: they only tested usage scenarios under low concurrent load (single user, single proxy target).

By examining the source code, it can be found that this phenomenon is essentially due to two reasons:

  1. Weak utilization of multi-core computing power;
  2. A complete TLS stack.

FTLSocks uses a more simplified protocol design, fundamentally ensuring higher throughput and better performance. As can be seen from the test data, whether in terms of throughput, response time, or the maximum/minimum request completion time, it holds a considerable advantage.

In terms of implementation, FTLSocks uses a coroutine pool to enhance concurrency performance, an object pool to reduce memory allocation, and a contention-free and lock-free design, so it can fully utilize CPU performance and achieve higher throughput under the same stress. The downside is that it takes up slightly more memory. In long-time large-scale concurrency scenarios, using a concurrent pool is advantageous in terms of both computational and memory resource consumption.

Although FTLSocks has a slight disadvantage in terms of CPU consumption per unit of data, it is still roughly at the same level. Considering the rich security features and excellent performance provided, FTLSocks should be considered a better design.

In addition, due to the nature of the FTLS protocol, no real TLS connection is ever established, thus allowing for two additional benefits. (1) It can leverage the HTTPS reverse proxy feature of nginx to use other HTTPS websites around the world as reverse proxy targets, saving effort in setting up and running local services and reducing system resource consumption; (2) it can serve as a backend for the reverse proxy to achieve better performance and higher availability with the assistance of load balancing.

6 Application scenarios and discussion

6.1 Scenarios

Rapid deployment: The simplest way to deploy FTLSocks requires only a domain name and a server. The deployment process involves just three steps: generating a configuration file, uploading it, and running it to provide proxy services. Legal certificates and disguise services are not mandatory. However, using this method also results in the lowest level of concealment and availability. The usage methods described below can fully leverage the power of the FTLS protocol, greatly enhancing concealment, availability, and security.

Service cluster: The minimalist design of FTLSocks makes deploying service clusters at scale extremely easy. The rapid deployment method can be easily extended to the deployment of service clusters.

Rapid recovery: By establishing a forwarding network, a single point of failure can be mitigated, and access can be quickly restored if the FTLS server becomes inaccessible. This network requires at least two forwarders and one server; a better structure should include at least three forwarders and two servers. More forwarders are preferable if conditions permit. During operation, at least one forwarder should always remain unused except in emergencies, to serve as an emergency entry point for restoring access. The general structure and communication process of this forwarding network is shown in Figure 9. The URGE connection is never used outside of an emergency, and Relay forwards traffic to the real FTLS service, thereby hiding the IP address of the proxy server from the attacker’s observation. This usage is not part of the FTLS protocol, but this aspect has been considered in the design of the protocol to ensure the smooth operation of the forwarding network.

A network toplogy diagram. On the left is a laptop computer icon labeled “User”. There are two arrows that leave User and go to an icon of a firewall: one arrow labeled “URGE” and one labeled “Normal”. Beyond the firewall there are three desktop computer icons labeled “Relay1”, “Relay2”, and “Relay3”. An arrow from the firewall to Relay1 is labeled “URGE”; arrows to Relay2 and Relay3 are unlabeled. On the right are two server rack icons labeled “FTLS1” and “FTLS2”. Relay1 points to FTLS1 on an arrow labeled “URGE”. Relay2 and Relay3 each point to both of FTLS1 and FTLS2.
Figure 9: Forwarding network topology diagram

Local intrusion: The risk of an attacker obtaining the master key from memory after an intrusion can be reduced by requesting certificates, deploying disguise services, using strong master passwords, running with dedicated non-privileged users, only allowing non-privileged users to log in via SSH with password-protected private keys, and not running other services.

Traffic shunting: Placing software similar to AutoProxy before FTLSocks can achieve traffic shunting, thereby avoiding unnecessary proxy actions and improving user experience.

6.2 Discussion

The configuration wizard provided by FTLSocks allows adjustment of certain parameters. These parameters have a bearing on performance and traffic load capacity, while also imparting specific characteristics to the traffic (e.g., performance degradation curves, time-out period, etc.). To avoid attacks from traffic analysis techniques based on traffic fingerprints, concealment can be further enhanced by not using default parameter values. In addition, the KeepAlive and Deadline parameter values provided by the configuration wizard are common defaults and can be left unchanged.

“First-packet authentication” combined with the “forwarding failure” mechanism used in FTLSocks requires a disguise service in its design, and the choice of this service has a significant bearing on concealment. Although TLS-based disguise cannot be distinguished at the application layer, it is still possible to distinguish it in specific scenarios. To mitigate this problem, two strategies can be employed: making it impossible for attackers to determine the nature of the traffic and disguising the statistical characteristics of specific scenarios. Taking the first strategy as an example, the level of suspicion can be reduced by using a reasonable disguise service (e.g., an API service with authentication) at the proxy’s disguise exit. The key point here is that unauthorized users of API services can only get very short error messages and cannot determine the patterns of changes in the length and nature of subsequent content; web-based services have a large number of static resources or dynamic resources with very low update frequencies, and the size and timing of access to these resources are predictable. Furthermore, a long-term connection to a specific web service is suspicious, while long-term use of an API service is acceptable. The latter is exemplified by NaïveProxy, which directly uses the network stack modules from the Chromium open-source project, allowing it to obtain the statistical characteristics of the Chrome browser.

From an attacker’s perspective, the best way to identify a disguised target is to use a user profiling–based method to observe activities in the target network for a long time[36], and make a comprehensive judgment using the seven proxy indicators mentioned in Section 2.2. These indicators are features that are bound to appear in the network traffic of proxy users. Judicious use of proxies and appropriate configuration of the local network and disguise service can effectively thwart this attempt.

In addition to the deployment of the proxy, the environment and manner in which it is used also have a bearing on concealment. For example, blocking WebRTC when using a proxy in a browser can prevent your local IP address from being leaked when you access certain websites. Using a CDN to route proxy traffic through a legitimate path can also enhance concealment, but existing CDNs require the private key of the source website’s certificate to provide services, which is a concession that fundamentally represents a security flaw in the design of the CDN technology. By adopting the CDN scheme proposed in reference [35], which allows CDN providers and source websites to independently maintain their own certificates and private keys, it would be much safer to use a CDN as a proxy service.

In summary, FTLSocks can achieve greater concealment, security, and availability by being used in specific ways. In future work, we can try to provide higher reliability in complex network environments by integrating the Turbo Tunnel concept[33], and reduce the possibility of side channel–enabled identification[37] by proper padding.

7 Conclusion

FTLSocks, by designing a new disguise protocol, balances throughput, response time, and security, ultimately achieving a significant improvement in performance and security while keeping increases in CPU and memory resource consumption minimal. The new protocol achieves security similar to Trojan and performance close to Shadowsocks, while avoiding the inherent flaws of Trojan in reverse proxying. As a new disguise proxy protocol, FTLS is a superior option compared to existing popular protocols.

Acknowledgments. We would like to thank professor 陈嘉耕 (Chen Jiageng) for his meticulous guidance during the writing of this paper.

Note: All figures in the text are original creations by the authors, and the text annotations are in English due to space constraints. Hereby clarified.

References

[1]
Elahi T, Goldberg I. CORDON—A Taxonomy of Internet Censorship Resistance Strategies[EB/OL].
[2]
Ensafi R, Fifield D, Winter P, et al. Examining how the Great Firewall Discovers Hidden Circumvention Servers[C]. The 2015 Internet Measurement Conference, 2015: 445-458.
[3]
Berger T. Analysis of Current VPN Technologies[C]. First International Conference on Availability, Reliability and Security, 2006: 8pp.-115.
[4]
Karlin J, Ellard D, Jackson A W, et al. Decoy Routing: Toward Unblockable Internet Communication[C]. FOCI, 2011.
[5]
Manfredi V, Songkuntham P. MultiFlow: Cross-Connection Decoy Routing using {TLS} 1.3 Session Resumption[C]. 8th {USENIX} Workshop on Free and Open Communications on the Internet, 2018.
[6]
Schuchard M, Geddes J, Thompson C, et al. Routing around Decoys[C]. CCS ’12: The 2012 ACM conference on Computer and communications security. 2012: 85-96.
[7]
Wustrow E, Swanson C M, Halderman J A. Tapdance: End-to-middle anticensorship without flow blocking[C]. 23rd {USENIX} Security Symposium, 2014: 159-174.
[8]
Nasr M, Zolfaghari H, Houmansadr A. The Waterfall of Liberty: Decoy Routing Circumvention that Resists Routing Attacks[C]. CCS ’17: The 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 2037-2052.
[9]
Chen J L, Trang Nguyen U. A Robust Protocol for Circumventing Censoring Firewalls[C]. 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2018: 1798-1805.
[10]
Brubaker C, Houmansadr A, Shmatikov V. Cloudtransport: Using cloud storage for censorship-resistant networking[C]. International Symposium on Privacy Enhancing Technologies Symposium. Springer, Cham, 2014: 1-20.
[11]
Fifield D, Lan C, Hynes R, et al. Blocking-Resistant Communication through Domain Fronting[J]. Proceedings on Privacy Enhancing Technologies, 2015, 2015(2): 46-64.
[12]
Shbair W M, Cholez T, Goichot A, et al. Efficiently Bypassing SNI-Based HTTPS Filtering[C]. 2015 IFIP/IEEE International Symposium on Integrated Network Management, 2015: 990-995.
[13]
Barradas D, Santos N, Rodrigues L. DeltaShaper: Enabling Unobservable Censorship-Resistant TCP Tunneling over Videoconferencing Streams[J]. Proceedings on Privacy Enhancing Technologies, 2017, 2017(4): 5-22.
[14]
Deri L C, Martinelli M, Bujlow T, et al. NDPI: Open-Source High-Speed Deep Packet Inspection[C]. 2014 International Wireless Communications and Mobile Computing Conference, 2014: 617-622.
[15]
Gancheva Z, Sattler P, Wüstrich L. TLS Fingerprinting Techniques[J]. Network, 2020, 15.
[16]
Zhao J Y. Research on Censorship Circumvention Anonymous Techniques[D]. Beijing: Beijing University of Posts and Telecom, 2016.
(赵进洋. 审查规避匿名技术研究[D]. 北京: 北京邮电大学, 2016)
[17]
Frolov S, Wampler J, Wustrow E. Detecting Probe-resistant Proxies[C]. Network and Distributed System Security. The Internet Society, 2020.
[18]
Zhiniang Peng. Redirect attack on Shadowsocks stream ciphers. https://github.com/edwardz246003/shadowsocks, Mar. 2019.
[19]
McPherson R, Houmansadr A, Shmatikov V. CovertCast: Using Live Streaming to Evade Internet Censorship[J]. Proceedings on Privacy Enhancing Technologies, 2016, 2016(3): 212-225.
[20]
Li S, Schliep M, Hopper N. Facet: Streaming over Videoconferencing for Censorship Circumvention[C]. The 13th Workshop on Privacy in the Electronic Society, 2014: 163-172.
[21]
Houmansadr A, Brubaker C, Shmatikov V. The Parrot is Dead: Observing Unobservable Network Communications[C]. 2013 IEEE Symposium on Security and Privacy, 2013: 65-79.
[22]
Frolov S, Wustrow E. The use of TLS in Censorship Circumvention[C]. NDSS, 2019.
[23]
Ptacek T H, Newsham T N. Insertion, evasion, and denial of service: Eluding network intrusion detection[R]. Secure Networks inc Calgary Alberta, 1998.
[24]
Appelbaum J. Technical analysis of the Ultrasurf proxying software[J]. The Tor Project, 2012.
[25]
Lotfollahi M, Jafari Siavoshani M, Shirali Hossein Zade R, et al. Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning[J]. Soft Computing, 2020, 24(3): 1999-2012.
[26]
Saber A, Fergani B, Abbas M. Encrypted Traffic Classification: Combining Over-and Under-Sampling through a PCA-SVM[C]. 2018 3rd International Conference on Pattern Analysis and Intelligent Systems, 2018: 1-5.
[27]
Zeng X M, Chen X S, Shao G L, et al. Flow Context and Host Behavior Based Shadowsocks’s Traffic Identification[J]. IEEE Access, 2019, 7: 41017-41032.
[28]
Liu C, Cao Z G, Xiong G, et al. MaMPF: Encrypted Traffic Classification Based on Multi-Attribute Markov Probability Fingerprints[C]. 2018 IEEE/ACM 26th International Symposium on Quality of Service, 2018: 1-10.
[29]
Zhang Y D, Chen J G, Chen K M, et al. Network Traffic Identification of Several Open Source Secure Proxy Protocols[J]. International Journal of Network Management, 2021, 31(2): e2090.
[30]
Wang L, Dyer K P, Akella A, et al. Seeing through Network-Protocol Obfuscation[C]. CCS ’15: The 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015: 57-69.
[31]
Deng Z Y, Liu Z H, Chen Z G, et al. The Random Forest Based Detection of Shadowsock’s Traffic[C]. 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics, 2017: 75-78.
[32]
Yang Y, Kang C C, Gou G P, et al. TLS/SSL Encrypted Traffic Classification with Autoencoder and Convolutional Neural Network[C]. 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems, 2018: 362-369.
[33]
Fifield D. Turbo Tunnel, a good way to design censorship circumvention protocols[C]. 10th {USENIX} Workshop on Free and Open Communications on the Internet, 2020.
[34]
Frolov S, Wustrow E. {HTTPT}: A Probe-Resistant Proxy[C]. 10th {USENIX} Workshop on Free and Open Communications on the Internet, 2020.
[35]
Wang Z, Li W Q, Cai Q W. Delegation Transparency for HTTPS with CDNS[J]. Journal of Cyber Security, 2018, 3(2): 16-30.
(王泽, 李文强, 蔡权伟. 面向 HTTPS 的内容分发网络代理关系透明化[J]. 信息安全学报, 2018, 3(2): 16-30.
[36]
Han Z H, Chen X S, Zeng X M, et al. Detecting Proxy User Based on Communication Behavior Portrait[J]. The Computer Journal, 2019, 62(12): 1777-1792.
[37]
Gao P, Guang H, Chen X and Li G S. Traffic Identification based on Side-channel Features for Security Proxy [J]. Computer Engineering, 2020: 1-10.
(高平, 广晖, 陈熹, 李光松. 基于侧信道特征的安全代理流量分类[J/OL]. 计算机工程, 2020: 1-10)

Biography photo of 吕英豪 (Lü Yinghao)

吕英豪 (Lü Yinghao) is currently pursuing a bachelor’s degree in Information Security at the School of Computer Science, Central China Normal University. His research interests include network information security and security protocol design. Email: huraway@mails.ccnu.edu.cn

Biography photo of 陈嘉耕 (Chen Jiageng)

陈嘉耕 (Chen Jiageng) received his Ph.D. in Information Science from Japan’s national university, Japan Advanced Institute of Science and Technology in March 2012. He is currently an associate professor at the School of Computer Science, Central China Normal University. His research interests include cybersecurity protocols, algorithm analysis, etc. Email: jiageng.chen@mail.ccnu.edu.cn