Nmap /

ProxyScan

nmap --proxy http://host1,socks5://host2 --proxy socks4://user@host3,http://user:pass@host4 -sK targets
nmap --proxyfile proxies.txt targets

Perhaps

nmap --proxy ftp://anonymous@ftphost -PK -sK

Considerations

Privacy

One of the purposes of a proxy scan is to maintain privacy. For that to be effective, as little as possible must be sent over an unproxied channel. For this reason domain name lookups should be disabled or done through a proxy. It would be possible to combine non-proxied and proxied scans, for example a SYN host discovery followed by a proxied TCP connect port scan, but as a practical matter it may make sense not to support that.

Tor has RESOLVE and RESOLVE_PTR to resolve addresses anonymously.

Nsock has to support chained proxy connections, ideally in a transparent way. You should be able to create an iod, set a proxy chain on it, connect, and only be notified when the connection to the final host succeeds or fails.

Parallelism

Using Nmap with proxychains works, but is very slow because proxychains 3.1 turns nonblocking sockets into blocking sockets. The connect calls that Nmap expects will return immediately only return after a connection, connection refused, or a timeout. We can go faster by making a synchronous connections, perhaps handling an event as each link in the chain is completed.

What to do with more than one proxy chain? A simple solution is to treat each chain equally, and use them to round-robin requests. Or assign some weighting function to them, perhaps based on RTT, that preferentially chooses some over the others.

w(i) = (1 / rtt_i) / Σ_jⁿ (1 / rtt_j)

The completion time of a parallel scan is the maximum of the total times spent sending through each proxy. The above formula minimizes the maximum time by making all times equal. If N probes are sent in total, using the above formula the time spent sending to host i is

t(i) = N × w(i) × rtt_i
= N / Σ_jⁿ (1 / rtt_j)

which does not depend on i and is therefore equal for every i.

For example, if rtt₁ is 1 s and rtt ₂ is 10 s, we want to send 10 / 11 of the probes (91%) through proxy chain 1 and 1 / 11 through proxy chain 2. Even though chain 1 is much faster, in the time it takes to send 10 probes through it we get to send one through chain 2 for free. This way we can send 100 probes in 95.8 s.

Achieving this probe distribution is reminiscent of scheduling CPU timeslices to processes in an operating system, and indeed I found a paper, Charge-based Proportional Share Scheduling (A deterministic alternative to lottery scheduling) (alternate link). The algorithm is like a multidimension variant of Bresenham's line algorithm. This will allow scheduling of proxy chains in an efficient and deterministic manner.

The above analysis ignores that the RTT may be different for different hosts through the same chain.