This is an English translation of the research paper "高速网络环境下入侵检测系统结构研究", published in the Journal of Computer Research and Development, .
Posted .
(PACT Laboratory, College of Computer Science, Harbin Institute
of Technology, Harbin 150001)
(cxx@mail.nisac.gov.cn)
Date received: 2003-07-15; date revised: 2003-10-17
Fund project: National High-tech R&D Program of China “863”
(2002AA147020)
This paper proposes an architecture for intrusion detection systems in high-speed network environments. It effectively addresses the processing performance challenges of cybersecurity analysis on multi-line and high-bandwidth backbone network lines by integrating raw signal coupling technology (packet capture technology and stream reassembly technology), aggregation and balancing technologies, and an efficient data stream engine. The architecture has a good hierarchy, offering high scalability and adaptability, and it can accommodate a wide range of complex network environments, from low-speed access networks to high-speed backbone networks (multi–OC-48 links and higher), as well as various types of interface formats. The system based on such an architecture achieves line-speed performance in an eight-line OC-48 network environment when 16 data stream buses are configured, which exceeds the best performance levels reported for similar systems.
Intrusion detection systems (IDS) are currently an important and practical technical means to address cybersecurity issues. As network scale expands, bandwidth increases, technology advances, and the number of users surges dramatically, high-speed network environments are becoming increasingly prevalent. We define a high-speed network environment as a network environment with multiple lines operating at speeds of 2.5 Gbps or higher in a backbone network. IDS is typically deployed at the egress of the protected network and on core switches. One of the outstanding problems encountered in current IDS research is the great challenge of the speed of data processing. The famous information security research and consulting firm Gartner, Inc. put forward an argument that IDS would gradually phase out by 2005[1]. One of the four key reasons for this argument is that IDSes of the time were incapable of handling transmission rates exceeding 600 Mbps. The U.S. Department of Energy has identified high-speed intrusion detection systems as one of the key research priorities for IDS[2].
To date, there have been relatively few published results internationally. Sekar et al. proposed a high-performance IDS capable of processing speeds up to 500 Mbps[3], but it is based on offline data. IIS’s Giga Sentry and Cisco’s related products can reach a processing speed of 50 Kpps. A more practical high-speed intrusion detection system architecture is the stateful intrusion detection system architecture for high-speed networks proposed by Christopher Kruegel, et al.[4] This system consists of six parts: a network tap; a traffic scatterer; a set of m traffic slicers S0, …, Sm−1; a switch; a set of n data stream reassemblers R0, …, Rn−1; and a set of p intrusion detection sensors. The tap is used to capture the sequence of high-speed bandwidth network data frames, denoted F = ⟨f0, f1, …, ft⟩, within a specific time period ∆, and transmit it to the traffic scatterer. The traffic scatterer uses a specific classification algorithm to split F into m subsequences Fj: 0 ≤ j < m. Each Fj is a subset (possibly empty) of F. Each data frame fi is an element belonging to one and only one specific subsequence Fj, so . The classification algorithm uses a round-robin algorithm to evenly split F into m subsequences, so each Fj contains one mth of the total traffic. Each subsequence Fj is sent to a traffic slicer Sj, which serves to forward associated data frames in Fj (the data frames that belong to the same attack scenario) to the same reassembler. m traffic slicers and n reassemblers are interconnected via a switch to form an m × n unidirectional cross matrix. The raw large traffic is decentralized and redirected into several smaller streams that individual sensors can handle, thereby solving the traffic problem while avoiding missing information.
This architecture effectively addresses the traffic splitting problem in a high-traffic environment, allowing the backend processing system to handle data far beyond the capacity of a single-node processor through a clustered approach, and it is highly scalable. However, this architecture has two shortcomings. First, it only monitors a single high-traffic line, and an extension is needed to support multi-line monitoring (in most cases). Second, communication channels are required between intrusion detection sensors to enable collaborative analysis of time-related events across multiple lines. We supplement the above two points and integrate the traffic scatterer, slicers, and switch into a unified system, creating an intrusion detection architecture suitable for complex multi-line, high-traffic network environments, as shown in Figure 1:
The architecture consists of a set of m raw signal couplers, an aggregation and splitting subsystem, a set of n data stream buses, n sets of sensors, a response mechanism (optional), and a logging subsystem. The function and structure of each of these subsystems are described below.
Raw signal coupler refers to a network’s raw data output mechanism, such as the mirror port of a switch or the monitoring port of a hub. The data signals on the line are reproduced in their entirety through these ports and sent to the next level for primary processing. With the development of network and transmission technologies, the single-line bandwidth of raw signal sources has evolved from 10 Mbps (Ethernet, E1/E3, and other interfaces), to 100 Mbps (FE, OC-3, OC-12, and other interfaces), 1 Gbps (GE, OC-48 and other interfaces), and now to 10 Gbps (10GE, OC-192, and other interfaces). Currently, due to the uneven development among various operators and the differing needs of application environments, the aforementioned raw data interface types coexist in both domestic enterprise networks and operator networks.
The coupling technology for raw signals is based on monitoring mechanisms, of which there are currently four types: shared hub monitoring, device port mirroring monitoring, optical coupling monitoring for certain types of interfaces, and dedicated device monitoring. If the network environment lacks the aforementioned devices/lines, a transport protocol conversion is required. The specific raw signal coupling technologies corresponding to different environments are as follows:
FE/Ethernet line (electrical port) signal coupling technology
This technology is suitable for data acquisition on Ethernet and fast Ethernet lines with electrical interfaces. The advantage of this technology is that it allows simultaneous acquisition of bidirectional data, while reducing the need for aggregation. However, the disadvantage is that when the sum of bidirectional traffic approaches or exceeds the bandwidth of a single line, it can cause congestion on the monitoring line. In the case of shared hubs, increased collisions may result in a sharp decline in available bandwidth. For mirror ports, monitoring capacity can be improved by increasing the number of mirror ports, but this requires the use of aggregation technology. The solution is to use high-bandwidth ports as monitoring ports, such as using GE ports to monitor on FE ports, to avoid congestion (as shown in Figure 2).
Figure 2: FE/Ethernet line (electrical port) signal coupling technology
Low-speed WAN line signal coupling technology
The conversion device in Figure 3 can be either a layer-3 device or a link-layer device. If a layer-3 device is used, then routing configuration is required thereon, but it can support multi-line WAN aggregation and conversion, offering broader applicability. If a link-layer device is used, then no layer-3 configuration is needed, and only simple installation and setup are required. However, it only supports single-line conversion and does not support WAN aggregation. This technology is suitable for signal coupling of WAN lines such as DDN, E1, and E3 (as shown in Figure 3).
Figure 3: Low-speed WAN line signal coupling technology
Optical fiber link signal coupling technology
This technology is more widely used in cable TV signal transmission and billing systems. It employs optical couplers to duplicate optical signals into one or more copies. Its characteristics include being a passive device with stable operation. However, bidirectional data is output separately, aggregation is required, and it cannot filter out unnecessary data. It is suitable for GE, 10/100BASE-FX, and other optical interfaces (as shown in Figure 4).
Figure 4: Optical fiber link signal coupling technology
Dedicated signal coupling device
A dedicated data acquisition device can perform certain preprocessing on the monitoring data before outputting the data, such as filtering out invalid packets, attack packets, and so on. It is characterized in that tandem connection in the line may potentially degrade the raw data signals. As a result, it demands high levels of reliability and fault tolerance. This technology is commonly used in communication instruments (as shown in Figure 5).
Figure 5: Dedicated signal coupling device
This subsystem is responsible for performing necessary interface conversions of raw network data streams—converting network device communication interfaces (e.g., POS, ATM, E1, etc.) into host communication interfaces (e.g., FE, GE, etc.), and merging and evenly splitting the data streams before outputting them to the processor cluster. In a low-bandwidth network environment, if the output port of the monitoring data is in the form of a host interface (e.g., FE, GE, etc.), then multiple interface cards can be configured on the sensor units to receive the raw data signals and process them directly. If there are multiple lines between the protected network and the external network, there are multiple monitoring data lines that need to be aggregated, classified by stream, and then output to the data stream bus in a balanced manner.
Conversion device aggregation technology
This technology is applicable to FE/GE monitoring lines. It assigns a separate VLAN to each monitoring line’s receiving port and configures a high-speed aggregation port (GE). Using SPAN, all incoming traffic from the monitoring line’s receiving ports is mirrored to the aggregation port for output. In this way, once a particular monitoring data stream enters the aggregation switch, a copy thereof is made and sent to the aggregation port for output. For incoming raw packets, the switch will discard the raw packets without interference because it cannot find an output port corresponding to the destination MAC address in the packets and there are no other active ports in the VLAN where the receiving port of the monitoring data is located for broadcasting. This technology is suitable for monitoring environments where the total traffic from the monitoring lines is less than the bandwidth that the packet processor can handle (as shown in Figure 6).
Figure 6: Conversion device aggregation technology
Aggregation and balancing technologies based on hash stream classification
When the total traffic from the monitoring lines exceeds the bandwidth capacity of a single packet processor, traffic needs to be split. To ensure a balanced distribution, the traffic splitting algorithm must operate with a sufficiently fine granularity. Currently, there are two options. One is to split the traffic according to the destination IP address. The advantage of this method is that an off-the-shelf routing device can be used to achieve the purpose, while the disadvantage is that the output traffic varies greatly, with sharp spikes on the traffic variation curve, leading to a suboptimal mean square deviation. Under extreme conditions, the output traffic may exceed the bandwidth of the output port or the processing capacity of the packet processor, which results in packet loss or packet processor congestion. Splitting by IP address makes it difficult to evenly split traffic across output ports, but aside from burst traffic, output traffic remains relatively stable most of the time. Therefore, this technology is still an economical choice for network environments that are not heavily loaded with traffic. The other option is stream-based traffic splitting (originally derived from layer 4–7 switches and layer 4–7 load balancing systems, as shown in Figure 7); i.e., performing a certain hash operation H(Sip, Dip, Sp, Dp) on several stream-related parameters within a data packet p: the source address Sip, destination address Dip, source port Sp, and destination port Dp. H(p) must satisfy the following condition:
H(Sip, Dip, Sp, Dp) = H(Dip, Sip, Dp, Sp).
There are n output ports. For an incoming packet p, the output port number Tn is calculated as follows:
Tn = H(Sip, Dip, Sp, Dp) mod n + 1.
The larger the value of H(p)/n, the finer the traffic splitting granularity, resulting in more balanced splitting and stronger resistance to traffic spikes. This is because, in the case of large traffic volumes related to concentrated IP addresses (such as access traffic to large websites or high-speed proxy servers), while the destination address of the TCP stream remains unchanged, the source ports are constantly changing (usually incrementing cyclically, for most clients). Currently, many new general-purpose layer-3 devices support traffic classification features, which can effectively perform stream-based traffic splitting. The experimental comparison data for the two traffic splitting technologies described above are shown in Figure 8 and Figure 9.
Figure 7: Aggregation and balancing technologies based on hash stream classification
Figure 8: Effect of splitting based on hash stream classification
Figure 9: Effect of splitting based on IP address
Since sensors need to perform operations such as recovery, decoding, matching, and repeated search on the raw network data, they consume a lot of CPU resources. Therefore, multiple specialized sensors with different functions are needed to process a single piece of raw data. In a high-speed network environment with large bandwidth, it is impractical to store raw network data before processing, so a bus system is required to broadcast the raw data to all the sensors. For the currently common host interfaces FE and GE, there are two widely used forms: fast Ethernet bus and optical coupling. The fast Ethernet bus connects the sensors and the output ports of the traffic splitting subsystem via a hub device. The receiving ports of the sensors are set to silent mode, thus enabling one-to-many data replication at the data link layer. An advantage of this setup is that signal regeneration amplifies the data without a reduction in energy, while erroneous packets are discarded. The optical coupling method involves using an optical coupler for passive signal duplication at the physical layer. Its advantage is that no abnormal packets are lost; however, energy will decrease, thus limiting the number of sensors that can be connected.
IDS sensors perform functions such as packet reception, data stream recovery, decoding, and analysis of attack characteristics. Stateful sensors also need to maintain the state of each connection. Sensor software consists of a data stream engine and an analysis engine.
The function of data stream engine consists of two parts. The first part is to receive raw data packets from the packet shaper and place them into the user buffer, which is generally known as packet capture technology. The second part is stream reassembly, which performs step-by-step reassembly from data frames to IP fragments, to IP packets, and finally to TCP-layer connections.
The development of packet capture technology has gone through three stages so far. The first stage involves using the library functions provided by the operating system for packet capture, such as the libpcap library in Linux and the Winsock2 library in Windows, which are characterized by good hardware compatibility and good operating system compatibility. However, due to the significant overhead involved in the process of a data packet arriving at the network card, going through DMA to kernel memory space, and then being copied multiple times before reaching user space, making it difficult to achieve a high packet capture rate. (A 2 GHz CPU running Linux can achieve a packet capture rate of 50 Kpps, with FreeBSD performing slightly better.) Additionally, some operating systems (such as Linux) generate an interrupt for each received packet; therefore, when packet traffic is too high, the system may become overwhelmed by frequent interrupts, leading to system paralysis.
The second stage involves using some buffering techniques, for example, modifying network card parameters to reduce interrupt frequency and writing a specialized driver to buffer packets after they arrive at kernel memory space. Once a certain number of data packets (e.g., 1 K data packets) have accumulated, they are submitted to the user application. This mode still maintains relatively good hardware compatibility, but compared to strong coupling with the operating system, there is a significant improvement in performance in the first stage. (With a 2 GHz CPU host running the Linux operating system, the packet capture rate can exceed 100 Kpps.)
The third stage involves using zero-copy technology. By modifying the network card driver, after receiving a certain number of packets, the network card can transfer the data packets to a buffer in user memory space via DMA, thereby reducing the number of memory copies required for packet capture to zero (hence the name “zero-copy”). This mode offers poor hardware transparency, as it requires detailed understanding of the technical information of the network card. In addition, since packet capture bypasses the system’s protocol stack, it results in stronger coupling with the operating system. However, by addressing the memory copy bottlenecks in packet capture, this mode has a significant performance improvement. (On a 2 GHz CPU running the Linux operating system, this mode can achieve a packet capture rate of over 600 Kpps. With mixed packet lengths, it can easily enable true gigabit line-speed packet capture, making gigabit-interface firewalls genuinely practical.)
Flow reassembly technology is primarily based on the relevant standards of the TCP protocol stack and uses the techniques of finite state automata (FSA) for reassembly. Since there are many techniques specifically designed to disrupt IDS stream reassembly, such as insertion attacks, evasion attacks, and connection pool DoS attacks proposed by Mark Handley et al.[5], corresponding identification methods need to be considered to counter these during reassembly. Typically, methods such as anomaly detection (e.g., PHAD proposed by Mahoney et al.[6]) and traffic shaping[4] can be used.
The function of the analysis engine is to perform tasks such as session recovery, high-layer protocol reassembly, decoding, decompression, and code conversion on the data carried within the transport layer to create a stateful information stream. The engine then uses pre-constructed sensitive information patterns to match the information stream. For successfully matched streams, it logs the events and notifies the response mechanism. The tasks performed by the analysis engine are CPU-intensive, making it the most CPU-intensive component. Its performance and efficiency determine the processing capability of the information intrusion detection system. Functionally, the analysis engine is divided into two modules. One is a protocol analysis module, which is responsible for restoring various application-layer protocols of interest in the data streams, analyzing the structure of the information streams, and generating various types of structured information streams. The other module is a pattern matcher, which performs pattern matching on each structure of interest within the structured information streams based on a library of known sensitive patterns to identify sensitive patterns.
The protocol analysis module reconstructs the specified application-layer protocols within the data streams, such as HTTP, SMTP, POP3, and BBS, saves the state of each stream’s application-layer protocols, and restores the data streams that can undergo content matching. For example, in HTTP, it is necessary to parse each header field (URL, HOST, CONTENT-TYPE, etc.), identify the content type in the HTTP body, and determine its starting boundaries. Similarly, in SMTP, the module must restore such contents as the sender, recipient, subject, message body, and attachments, as well as perform decoding, boundary identification, and nested message body identification.
In order to improve efficiency, finite state automata (FSA) are used for string matching for sensitive keywords[7, 8]. Since each input information structure needs to be matched, it is necessary to adopt a space-for-time strategy. A multi-pattern finite state automata (MP-FSA) is used to perform one-time keyword matching on the input information stream. For patterns that require fuzzy matching, the matching patterns should also be extended, and a multi-pattern extended finite state automata (MP-EFSA) is used for the matching process.
For non-keyword-based pattern matching, separate processing is required, that is, the same structure needs to be processed multiple times. Non-keyword-based pattern matching is required to be light-weight.
The response mechanism, also known as linkage mechanism, is also one of the hot research topics in the field of IDS. Its function is to block the connection of the attack traffic between the protected network and external networks in real time in order to actively terminate the process of the attack. It serves as the active actuator of the intrusion detection system. Because the response mechanism can be easily exploited by attackers, or cause its own failure or be used as a stepping stone for DDoS, the academic community has long recommended not activating the IDS response mechanism under normal circumstances. However, as attack methods become increasingly sophisticated and can cause fatal damage to a network in a very short period of time, the time available for human judgment and response is becoming more limited. Therefore, how to effectively respond to attacks in real time, as well as the linkage with gateway and host devices, has become highly significant research topics.
The development of the response mechanism has gone through two stages: IP packet filtering (static and dynamic IP packet filtering) and connection spoofing (transport-layer and application-layer connection spoofing), which has led to a current state where multiple approaches coexist, each tailored to different applications. It is currently evolving towards a third stage—real-time connection filtering.
Static IP packet filtering is realized by an IDS through linking with endpoint network-layer devices (such as routers, three-layer switches, etc.) at the connection boundary between the protected network and external networks, and specified IP addresses are filtered by setting access control lists (ACLs) or static routing tables on these devices[8]. Due to the large number of IP addresses that need to be filtered, most network layer devices cannot meet the requirements in terms of the size and performance of ACLs. Therefore, static routing is more commonly used in practice. Using this method, the information intrusion detection system can only perform access control through a dedicated client program that writes statically. This results in a coarse level of granularity (IP address level), slower response times, and limited capacity. However, it allows for static writing into the configuration files of routing devices, making it non-volatile.
Dynamic IP packet filtering[8] means that the intrusion detection system uses dynamic routing protocols (such as BGP, OSPF, etc.) and key routing devices for route propagation; i.e., propagation of the IP addresses that need to be filtered into the routing tables of the routing devices. This method features fast response times and large capacity. However, it only allows dynamic writing into the routing table of the routing device’s memory (such as RAM), which is volatile and has coarse granularity. Connection spoofing means that the information intrusion detection system forges connection termination signals (such as RST, FIN) during the transmission of sensitive connection and sending them to the source and destination addresses to interrupt the connection. It is characterized by strong real-time capabilities and fine granularity (connection level), and allows for the interruption of a specific sensitive connection. The disadvantage is that it heavily depends on the operational status of the analysis system, requires sending packets to the service network, and is vulnerable to DoS attacks.
By linking with connection-level firewall devices, data streams can be filtered on the five-tuples of connection (transport protocol type, source address, source port, destination address, destination port). It allows for filtering based on any specified combination of five-tuples, offering strong real-time capabilities and fine granularity.
For an IDS, the logging mechanism is an essential and key component for communicating with users and alerting them[9]. For IDS in high-speed network environments, the performance of the logging mechanism is critical. Since IDS is an alert system, it not only identifies known attacks but also generates a certain number of anomaly reports, and these reports are generated in high-speed network environments at a rate that often far exceeds human processing capacity, causing important attack alerts to be buried in a flood of irrelevant alerts and potentially escape detection[10, 11]. Therefore, in IDS in high-speed network environments, the logging mechanism must be redesigned to perform multiple processing of alert information, data fusion, clustering, and classification[12, 13, 14] in order to highlight important information while concealing general information. International research in this area has made some progress, and products have already begun to emerge. However, data fusion methods remain relatively simplistic, and their effect is still limited[15], indicating there is still a considerable way to go.
The architecture of an intrusion detection system in high-speed network environments proposed herein effectively addresses the processing performance challenges of cybersecurity analysis on multi-line and large-bandwidth backbone network lines. The architecture has a good hierarchy, offering high scalability and adaptability, and it can accommodate a wide range of complex network environments, from low-speed access networks to high-speed backbone networks, as well as various types of interface formats. Through backbone network experiments, configuring 16 data stream buses enables line-speed processing of network data across eight OC-48 interfaces.
陈训逊 (Chen Xunxun), male, born in 1972, Ph.D. candidate, whose main research areas include computer network and information security, and who was awarded the first prize of National Science and Technology Progress Award in 2002.
方滨兴 (Fang Binxing), male, born in 1960, professor, doctoral supervisor, whose main research areas include computer network and information security, and who was awarded the first prize of the National Science and Technology Progress Award in 2002 (bxfang@mail.cnnisc.gov.cn).
李蕾 (Li Lei), female, born in 1972, Ph.D. candidate, whose main research areas include computer network and information security (lilei@pact518.hit.edu.cn).