These are the visual aids I used to deliver a talk on IPv6 OS fingerprinting on October 16, 2015 at AISec 2015.

Remote Operating System Classification over IPv6

(Authors are in alphabetical order.)
David Fifield UC Berkeley
Alexandru Geana TU Eindhoven / Fox-IT, Delft
Luis MartinGarcia ETSIT, Polytechnic University of Madrid
Mathias Morbitzer Fox-IT, Delft
J. D. Tygar UC Berkeley

This talk is about the IPv6-based OS fingerprinting engine in Nmap, a widely used network security scanner.

# nmap -6 -O ipv6.google.com
Starting Nmap 6.49SVN ( https://nmap.org ) at 2015-09-28 11:36 MDT
Nmap scan report for ipv6.google.com (2607:f8b0:4009:804::1002)
Host is up (0.022s latency).
rDNS record for 2607:f8b0:4009:804::1002: ord08s10-in-x02.1e100.net
Not shown: 998 filtered ports
PORT    STATE SERVICE
80/tcp  open  http
443/tcp open  https
Device type: general purpose
Running: Linux 3.X
OS CPE: cpe:/o:linux:linux_kernel:3
OS details: Linux 3.12 - 3.18

OS detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 8.54 seconds

OS fingerprinting is relevant for network inventory, vulnerability scanning, and exploit tailoring.

History of OS fingerprinting in Nmap

1998–2007: 1st-gen IPv4 fingerprinter (1,684 fingerprints)
2006–present: 2nd-gen IPv4 fingerprinter (4,766 fingerprints)
2011–present: IPv6 fingerprinter (~300 training samples in ~100 classes)

Design goals, based on extensive experience with IPv4:

Low cost to maintain the training database.
Resilience in the face of network interference.

The OS fingerprinting process

Do a port scan to find open/closed ports.
Send up to 18 crafted OS probes (13 TCP, 4 ICMPv6, 1 UDP).
Convert response packets into feature vectors.
Run linear classifiers, one per OS class.
Output the best match if its “novelty” is sufficiently low.

If there is no good match, the system displays a raw fingerprint and asks the user to submit it.

We use LIBLINEAR in its L₂-regularized logistic regression mode.

Why OS fingerprinting works

Different OSes speak different “dialects” of TCP/IP.

	Linux 3.12	Windows 7	MFC-9440CN printer
S1.PLEN=	40	40	44
S1.HLIM=	64	128	64
S1.TCP_MSS=	1440	1440	1420
S1.TCP_WSCALE=	5	8	0
S1.TCP_WINDOW=	28560	8192	8448
S2.TCP_WINDOW=	28560	8192	8328
S3.TCP_WINDOW=	28560	8192	8792

Probe selection

We looked for “SHOULD”s and “MAY”s in IPv6 standards.

Some IPv6 standards documents
RFC 2460 (IPv6)
RFC 2463 (ICMP for IPv6)
RFC 2473 (Generic Packet Tunneling)
RFC 2675 (Jumbograms)
RFC 3122 (Inverse Discovery)
RFC 3775 (Mobility)
RFC 3971 (Secure Neighbor Discovery)
RFC 4620 (Node Information Queries)
RFC 4782 (Quick-Start)
RFC 4861 (Neighbor Discovery)
RFC 5570 (CALIPSO)

We built a test program with 154 candidate OS probes.

Volunteers tested all 154 probes against a “seed” set of OSes.

We selected 18 probes that offer good efficiency: 13 TCP, 4 ICMPv6, 1 UDP.

Features

IPv6		TCP		ICMPv6
PLEN TC HLIM		TCP_ISR TCP_WINDOW TCP_FLAG_F TCP_FLAG_S TCP_FLAG_R TCP_FLAG_P TCP_FLAG_A TCP_FLAG_U TCP_FLAG_E TCP_FLAG_C TCP_FLAG_RES8 TCP_FLAG_RES9 TCP_FLAG_RES10 TCP_FLAG_RES11 TCP_OPT_0 … TCP_OPT_15 TCP_OPTLEN_0 … TCP_OPTLEN_15 TCP_MSS TCP_SACKOK TCP_WSCALE TCP_CORR_WINDOW_MSS		ICMPV6_TYPE ICMPV6_CODE

Novelty

A major challenge is identifying when the classifier doesn’t have a good answer (e.g. a never-before-seen type of network printer).

Novelty is the distance of a feature vector from the mean of a class, where each dimension is scaled by the inverse of its variance. (One-sample classes have their variance set to a small constant.)

Submissions

We rely on user submissions to grow the database.

But IPv6 adoption is still not as high as we would like :(

In the time it took to get 4,700 IPv4 submissions, we got only 97 IPv6 submissions.

Challenges

Low ratio of training samples to classes. 16% of training samples are the only member of their class; 10% are in a two-sample class.

20–30% of classes are unknown, embedded OSes (identified only by hardware model number).

Network-corrupted and missing features.

Lack of ground truth. Very few training samples compared to IPv4.

Evaluation

10-fold cross-validation on our training set of 290 samples has an accuracy of 69%.

If we allow near misses (e.g., one Linux 3.x class confused for another Linux 3.x class), accuracy rises to 80%.

Work in progress

New probes

Inducing fragmentation
IPv6 extension headers
Multicast listeners

Feature imputation

Fingerprints are often missing features (which may be a characteristic of the OS and may not)
Tests with imputation increase cross-validation accuracy by 1–2%