Detection and prevention of DNS spoofing attacks

Abstract

DNS is the foundation of most network applications, and an attack on it would affect the normal operation of the entire Internet. DNS spoofing is a commonly used tactic by attackers. It is stealthy, broad in scope, and effective. At present, there is no good defense strategy for this kind of attack. On the basis of analyzing the principle of DNS spoofing, this paper proposes three attack detection methods and attack packet identification methods, which have a positive effect on improving DNS security and making DNS more resistant to attacks.

DNS is a distributed database system for managing the mapping of host names and address information. DNS makes it much easier for people to use boring IP addresses by associating them with names that are easy to remember and understand. DNS is the foundation of most network applications, but due to design flaws in the protocol itself^[1], DNS does not provide proper information protection and authentication mechanisms, making it vulnerable to attacks. In March 2005, the Internet Storm Center (ISC) of SANS Institute issued a warning about DNS spoofing attacks, and a new round of attacks saw a large number of dot-coms fall victim, with at least 1,300 domain names lured to compromised servers. BIND domain name systems ranked first among Unix- and Linux-related security vulnerabilities in the list of the top 20 cybersecurity vulnerabilities in 2004 released by the SANS Institute, a computer security organization^[2]. It can be seen that it is imperative to protect against DNS attacks to ensure DNS security.

Many scholars have been exploring the issue of DNS security, and given some solutions to inherent security flaws in DNS protocols. The Domain Name System Security Working Group of the IETF put forward Domain Name System Security Extensions (DNSSEC), which adds an authentication mechanism to enhance the security of the protocol. However, DNSSEC still has some problems in system efficiency, key management, etc., and is still some distance away from mass popularization and application. Therefore, in addition to security research on the DNS protocol itself, there are also many articles exploring some security solutions on the basis of existing ones, mainly passive preventive measures such as upgrading server software, strictly configuring DNS, and prohibiting related functions^[3]. There is a lack of necessary solutions to some hard-to-avoid attacks such as DNS spoofing attacks.

1 Principle of DNS spoofing attacks

DNS, as a basic service on the Internet, is subject to threats from various sources. There are mainly the following types of attacks on DNS (as shown in Table 1), each of which has its own characteristics, as can be seen from the comparison.

As can be seen from Table 1, both DNS spoofing and cache poisoning attacks use deception, and both are relatively easy to implement, so these two types of attacks are also the most harmful. In addition, DNS spoofing mainly exploits authentication flaws in the protocol itself and is difficult to prevent. Cache poisoning, on the other hand, is more dependent on the vulnerability of the DNS server software itself, as long as the latest version of the upgraded software and strict configuration, the ability to prevent such attacks will be significantly improved.

Table 1: Comparison of DNS attacks
DNS attack type	Active or passive	Amount of attack traffic	What is attacked	Means of attack	Degree of difficulty
DNS spoofing	Passive	Small	Client/Server	Deception	Easiest
Cache poisoning	Active	Large	Server	Deception	Easy
Server compromise	Active	Small	Server	Hacking by exploiting vulnerabilities	Most difficult
Denial of service	Active	Largest	Server	Draining resources	Difficult

Some scholars also refer to cache poisoning attacks as DNS spoofing attacks^[4]. To clearly distinguish between these two types of attacks, we do not include cache poisoning attacks in the DNS spoofing attacks referred to in this paper, nor do we make cache poisoning the focus of discussion in this paper.

1.1 Principle of DNS Resolution

Before analyzing the principle of DNS spoofing attacks, let’s find out how DNS works. Assume that the domain name to be queried is www.hit.edu.cn, and assume that the client and the preferred DNS server meet the following conditions.

The preferred DNS server and the client start up for the first time and have no locally cached information.
The preferred DNS server is not an authoritative domain name server for the target domain name.

The specific query process is shown in Figure 1 with the following steps:

The client first recursively queries the preferred DNS server at www.hit.edu.cn.
The preferred DNS server checks the local resource record. If it exists, the preferred DNS server responds with an authoritative answer; if it does not exist, the preferred DNS server checks the local cache for it, and if it exists, returns the result directly. If neither the local resource record nor the cache exists, the query is iterated to the root server.
The root server returns the address of the authoritative domain name server for the CN domain, and the preferred DNS server continues to iteratively query the CN authoritative server.
The authoritative CN domain server returns the address of the authoritative domain name server in the edu.cn domain. The preferred DNS server iterates the query until it gets an authoritative response for the domain name www.hit.edu.cn, saves it in the local cache, and returns it to the client to complete the query.

A diagram of recursive DNS resolution. The “DNS client” contacts a “Preferred DNS server” (step 1), which in turn contacts “Other DNS servers”: “Root Server” (steps 2 and 3), “Cn” (steps 4 and 5), “Edu” (steps 6 and 7), and “Hit” (steps 8 and 9). Finally the Preferred DNS server replies to the DNS client in step 10. — Figure 1: Domain name resolution process

1.2 Principle of DNS Spoofing Attacks

Due to the flaws in the design of the DNS protocol, only a sequence number is used in DNS messages for validation, and no other means of authentication and protection are provided, which makes it easy for an attacker to listen to a query request and send a fake DNS response packet to the DNS client, thus carrying out a DNS spoofing attack.

The current way for all DNS clients to handle DNS response packets is simply to trust the packets that arrive first and discard all the ones that arrive later, without any analysis of the packet's legitimacy. In this way, spoofing can be achieved by simply ensuring that the spoofed packet arrives before the legitimate packet, which is usually very easy to achieve. DNS spoofing attacks may exist between a client and a DNS server, or between DNS servers, but they work in the same way, as shown in Figure 2.

Two end-user computers labeled “Attacked” and “Attacker” are connected to a communications line, along with a server labeled “Preferred DNS server”. In step 1, Attacked sends a request packet towards the preferred DNS server. In step 2, Attacker sees the request packet on the monitor and sends back a spoofed response packet. In step 3, the preferred DNS server sends a legitimate response packet. — Figure 2: DNS spoofing attack

Let’s still use www.hit.edu.cn as an example, assuming the fake IP address is 1.2.3.4. The specific spoofing process is as follows:

The DNS client sends a recursive resolution request for www.hit.edu.cn to the preferred DNS server.
The attacker monitors the request and sends a fake response packet to the requester based on the request ID, notifying the IP address corresponding to www.hit.edu.cn as 1.2.3.4.
The local DNS server returns a correct response, but the result is discarded because it is later in time than the monitor’s response.
The attack is complete and the client’s access to www.hit.edu.cn is redirected to 1.2.3.4.

2 Detection of DNS spoofing attacks

As discussed in Section 1.2, if a spoofing attack occurs, the client should receive at least two response packets, one legitimate response packet and one spoofed attack packet. Based on this feature, this kind of attack can be detected by certain methods. Detection methods can be divided into three types: passive monitoring detection, fake message detection, and cross-checking query:

Passive monitoring detection: This detection method captures all DNS request and response packets by means of bypass monitoring and creates a request–response mapping table for them. If, within a certain time interval, a request corresponds to two or more response packets with different results, a DNS spoofing attack is suspected, because the DNS server will not give multiple response packets with different results—even when the target domain name corresponds to more than one IP address, the DNS server will still return a single DNS response packet, just one that has multiple Answer sections.
Fake message detection: This detection method uses active sending of sensor packets to detect the presence of DNS spoofing attackers within the network. This detection method is based on a simple assumption: in order to send out spoofed packets as quickly as possible, the attacker will not validate the IP address of the domain name server. If a request packet is sent to a non–DNS server, normally no response will be received, but since the attacker will not verify whether the target IP address is that of a legitimate DNS server, he will continue with the spoofing attack, so if a response packet is received, it is an indication of an attack.
Cross-checking query: So-called cross-checking means that after the client receives the DNS response packet, it reverse-queries the DNS server for the DNS name corresponding to the IP address returned in the response packet. If the two are consistent, it means that there is no spoofing attack; otherwise it indicates the presence of an spoofing attack.

Three DNS spoofing attack detection methods are discussed above, of which the passive monitoring detection method is a passive method and the other two are active methods. The passive monitoring detection method does not cause additional traffic on the network, but it is a passive technique that does not detect potential attacks. The fake message detection method requires actively sending a large number of sensor packets, which increases the burden on the network. In addition, DNS spoofing attacks generally only spoof specific domain names, so there is great deal of uncertainty as to the selection of domain names to be resolved in the sensor packet, thus increasing the difficulty of detection. The cross-checking query method is trade-off between the two in that it actively validates the received response packets on the basis of passive detection, but this method relies more on the reverse lookup service of DNS servers, and a large number of DNS servers do not provide such a service.

The three detection methods have their own advantages and disadvantages, and the three can be effectively combined in practical applications to complement each other, so as to achieve good detection results.

3 Prevention of DNS spoofing attacks

Through the analysis of legitimate and spoofed response packets, it is found that spoofed response packets are generally simple and usually have only one Answer section and no Authority section or Additional section. This is also in line with the spoofing attacker’s intention to return the spoofed packet to the client as soon as possible, as the only way to make the spoofed packet arrive earlier than the legitimate packet is to save as much time as possible in packet construction. Legitimate response packets, on the other hand, are rich in information and usually have Authority sections, Additional record sections, etc.; in addition to possibly having multiple Answer sections. If spoofed packets and legitimate packets can be distinguished according to certain rules, then DNS spoofing attacks can be avoided, thus making the system resistant to attacks. Here are few possible preventive measures:

Weighting method. This method first gives each field in the DNS response packet a corresponding confidence threshold based on statistical analysis, then calculates the final confidence based on the packet’s situation, and finally selects the response packet with the highest confidence. The weight is a signed number, with the plus sign denoting the addition of the corresponding value and the minus sign subtraction. The computational rules are described below:

Let the packet confidence weight be $S$ , let $W_i$ be the weight of the $i$ th attribute, $N_i$ be the count of the $i$ th attribute, and $m$ be the total number of attributes. Then we have the following equation:

$S = \sum_{i=1}^m W_i * N_i$

The accuracy of this method relies heavily on the distribution of weights, and as long as the weights are set appropriately, satisfactory recognition results can be achieved.

Bayesian classification. This approach utilizes the idea of pattern classification to design a two-class Bayes classifier to distinguish between legitimate and spoofed packets. The features of legitimate and spoofed packets are first extracted based on the statistical information, then statistics about the probability distributions of these features are compiled, and a simple two-class Bayes classifier is designed from this to guide the identification of spoofed and legitimate packets. This paper only presents an idea of identification, the design of the classifier is not the focus of discussion in this paper, and here only one feature is used as an example to make a brief introduction.

According to domestic and foreign statistics, it is found that the distribution and number of DNS servers for the same domain name have certain features. Men&Mice^[5] made a survey on the distribution of DNS servers for GOV and COM domains and ccTLDs. The results are shown in Table 2.

Table 2: Domain name server distribution statistics
Test date	Domain	All servers are on the same subnet (%)	Single authoritative domain name server (%)
2001-11-08	GOV	23.15%	13.07%
2001-11-30	COM	36.2%	6.8%
2001-10-03	DK	55.2%	8.8%
2001-10-03	FI	48.2%	2.3%
2001-10-03	NO	29.5%	5.7%
2001-10-03	SE	41.1%	4.0%

The authors' survey on the top 100 websites in China shows similar results, as shown in Figure 3.

A bar chart. The x-axis is labeled “Number of authoritative servers”. The y-axis is labeled “Numver of websites”. The approximate heights of the bars are: 0: 0; 1: 10; 2: 59; 3: 21; 4: 5; 5: 4; 6: 0; 7: 0; 8: 0; 9: 1; 10: 10. — Figure 3: Distribution of domain name servers of the top 100 websites in China

From the above statistics, it can be seen that more than 90% of domain names have multiple authoritative domain name servers, which means that the probability of a legitimate DNS response packet containing multiple Authority sections is 90%. This can be used as a key feature in the design of Bayes classifiers.

Let $W_1$ denote that the packet is a legitimate packet, and $W_2$ that the packet is a spoofed packet. Let the feature $x$ denote the number of Authority sections contained in the packet, and $n$ be the number of responses received with different results for the same DNS request over a period of time. Bayes’ formula is as follows:

$\begin{array}{l} P(W_1|x) = \frac{P(W_1)p(x|W_1)}{P(x)}, P(W_2|x) = \frac{P(W_2)p(x|W_2)}{P(x)}\\ \textrm{and }P(W_i|x) + P(W_2|x) = 1 \end{array}$

Where:

$P(W_1)$: The probability that the packet is a legitimate packet, which takes the value $1/n$ since there is only one legitimate packet;
$P(W_2)$: The probability that a packet is a spoofed packet, which takes the value $1-P(W_1)=1-1/n$ ;
$P(W_1|x)$: The probability that a packet is a legitimate packet when the packet contains $x$ Authority sections;
$P(W_2|x)$: The probability that a packet is a spoofed packet when the packet contains $x$ Authority sections;
$p(x|W_1)$: The distribution of the number of Authority sections in legitimate DNS response packets, as shown in Figure 3.
$p(x|W_2)$: The distribution of the number of Authority sections in spoofed DNS response packets, with the distribution function as $\begin{array}{l} P(x|W_2) = \begin{cases}0.1,&x \ge 1\\0.9,&x = 0\end{cases}\\ P(x) = P(x|W_1)P(W_1) + P(x|W_2)P(W_2) \end{array}$

Construct a two-class classifier:

$\begin{array}{l} g(x)\\ = P(W_1|x) - P(W_2|x)\\ = \frac{P(W_1)P(x|W_1) - P(W_2)P(x|W_2)}{P(x)} \end{array}$

Since the normalization constant P(x) has no effect on the final classification, it can be removed, thus

$\begin{array}{l} g'(x)\\ = P(W_1)P(x|W_1) - P(W_2)P(X|W_2)\\ = \frac{1}{n} * P(x|W_1) - \frac{n - 1}{n} * P(x|W_2)\\ = \frac{P(x|W_1) - \frac{1}{n} * P(x|W_2)}{n} \end{array}$

Let the error rate of this Bayes classifier be $P(\mathrm{error}|x)$ . The following conclusion can be drawn:

$P(\mathrm{error}|x) = \begin{cases}P(W_1|x),&P(W_1|x)\le P(W_2|x)\\P(W_2|x),&P(W_1|x)>P(W_2|x)\end{cases}$

i.e.,

$P(\mathrm{error}|x) = \min[P(W_1|x),P(W_2|x)]$

This classifier can then be used to identify the legitimate response packet. Given a packet, first count the number of its Authority sections and then compute $g'(x)$ . If $g'(x)>0$ , it is a legitimate packet; otherwise it is a spoofed packet, where the error is $P(\mathrm{error}|x)$ . Of course, using only a single feature may bring a high error rate. This paper just presents an idea. For specific classifier design and error analysis, see reference^[6].

Cross-validation method. This method is described in the DNS spoofability test. With this method, after the client receives the DNS response packet, it reverse-queries the DNS server for the DNS name corresponding to the IP address returned in the response packet.

Three methods for identifying spoofed and legitimate response packets are discussed above, among which the weighting method and the Bayesian classification method are similar in that they are both based on statistical analysis. The weighting method relies on the formulation of a weighting policy, and the Bayesian classification method relies on the extraction of key features of packets and the compilation of statistics about their probability distributions. The cross-validation method can be done at the same time as a spoofability test, but it relies heavily on reverse lookup service and is difficult to use on a large scale. The above three identification schemes can be used in combination to complement each other’s advantages, thus achieving good identification results.

4 Experimental results and analysis

The experiment uses the well-known ADMID as a DNS attack tool, but ADMID itself is not yet fully representative of all attack modes because it does not process the Authority section and the Additional section. For this reason, the authors introduced a 10% Authority section floating factor in the construction of the spoofed packet, so that the Authority section of the spoofed packet floats with a probability of 10%, thus increasing the authenticity of the experiment. In the experiment, the weighting method and the Bayesian classification method were used to carry out domain name resolution tests on the top 100 websites in China. The experimental results are shown in Table 3.

Table 3: Experimental results of domain name spoofing attack identification
Identification scheme	Number of DNS requests	Number of spoofing attempts	Number of successful attack detections	Number of successful legitimate packet identifications
Weighting method (1,0,0)	1000	1000	973	726
Weighting method (1,1,1)	1000	1000	984	973
Bayesian classification (single-feature)	1000	1000	977	936

What’s in parentheses after the weighting method in Table 3 is the weight assignment policy, specifying the weights of the Answer section, the Authority section, and the Additional section, in that order. As can be seen from Table 3, when the weighting method is used, the identification rate is 72.6% when judging only by the Answer section, with a large error, while when the weighting policy is adjusted to consider the Answer section, the Authority section, and the Additional section comprehensively, the identification rate is greatly improved, reaching 97.3% on average. The identification rate of the single-feature Bayesian classification method is also high at 93.6%, as the fake message constructed by the current attack tool is simple. Of course, if the attack tool adjusts the message information, the weight distribution and features have to be adjusted accordingly to achieve a satisfactory identification effect.

5 Summary

Network attack and defense have always been the source of driving force for the forward development of cybersecurity. Only by continuously identifying security weaknesses in the network and correcting them on an ongoing basis can the whole network be made healthier and better. Based on the current DNS architecture, this paper proposes a new detection and prevention scheme for DNS spoofing attacks, which has a positive effect on improving DNS security and making DNS more resistant to attacks. Cache poisoning is another means of attack against DNS. Its detection and prevention, as well as tracing back attackers, will be the focus of the next phase of our work.

References

[1]

Mockapetris P. Domain Names-Concepts and Facilities[S]. RFC1034, 1987.

[2]

SANS Institute. The Twenty Most Critical Internet Security Vulnerabilities[Z]. http://www.sans.org/top20/, 2004.

[3]

Lin Manjun. Security Protection of Domain Name Servers[J]. Cybersecurity Technology & Application, 2001, (1): 21-24.

[4]

Lioy A, Maino F, Marian M. DNS Security[C]. Proc. of Terena Networking Conference, 2000.

[5]

Men & Mice. Single Point of Failure Research[Z]. http://www.menandmice.com/6000/6300_single_point_failure.html, 2001.

[6]

Duda R O, Hart P E, Stork D G. Pattern Classification (2nd Edition) [M]. New York: Wiley & Sons, 2001.