When you connect to another computer on a network your data usually travels through many devices (also called “hops”, “gateways”, or “routers”). I can use traceroute to list the devices my packets travel through to arrive at
steampowered.com from the Recurse Center.
$ time traceroute steampowered.com traceroute to steampowered.com (126.96.36.199), 64 hops max, 52 byte packets 1 gateway.net.recurse.com (10.0.0.1) 6.991 ms 1.959 ms 1.961 ms 2 188.8.131.52 (184.108.40.206) 3.264 ms 2.433 ms 3.217 ms 3 te0-7-0-18.ccr21.jfk04.atlas.cogentco.com (220.127.116.11) 6.573 ms 4.164 ms 4.728 ms 4 be2325.ccr42.jfk02.atlas.cogentco.com (18.104.22.168) 3.179 ms 2.550 ms 14.625 ms 5 be2057.ccr21.jfk10.atlas.cogentco.com (22.214.171.124) 4.080 ms be2056.ccr21.jfk10.atlas.cogentco.com (126.96.36.199) 5.944 ms be2057.ccr21.jfk10.atlas.cogentco.com (188.8.131.52) 5.057 ms 6 ae-13.r08.nycmny01.us.bb.gin.ntt.net (184.108.40.206) 3.689 ms 3.687 ms 3.398 ms 7 ae-3.r07.nycmny01.us.bb.gin.ntt.net (220.127.116.11) 25.137 ms 16.652 ms 12.502 ms 8 a104-88-12-183.deploy.static.akamaitechnologies.com (18.104.22.168) 4.710 ms 7.913 ms 4.445 ms traceroute steampowered.com 0.00s user 0.01s system 3% cpu 0.214 total
Here’s a diagram representing the route:
I found a couple of the steps interesting:
steampowered.comand ends the route. Now I know it is served by Akamai.
Traceroute sends out UDP packets with “time-to-live” (TTL) values starting at 1 and increasing by 1. The TTL field is special because every device that processes a packet decrements it by 1. When a device decrements the TTL on a packet to 0 the packet is “dropped” and not sent to any other devices. Instead a Internet Control Message Protocol (ICMP) “Time-to-live exceeded” packet is dispatched back to the original sender. When traceroute sends a packet with a high enough TTL to actually reach the final destination you usually get back “Destination unreachable (Port unreachable)” because the UDP packets sent to a destination port that is unlikely to be open.
I found the name “time-to-live” initially confusing because in other contexts “time-to-live” refers to actual time not “number of devices travelled through”.
In the example the entire process took 214 milliseconds with 153 milliseconds of that spent waiting for packets to return. On the same network
ping steampowered.com gets results in about 8 milliseconds (with some results coming significantly faster). However ping and traceroute aren’t directly comparable because ping only needs to make one round trip while traceroute makes at least one round trip per device between you and the destination.
Another thing that makes traceroute slower is that by default three packets (called probes in the manual) are sent at each step. Also getting the symbolic names (e.g.
gateway.net.recurse.com) isn’t free! Unless the name has already been cached a DNS query is made.
The closest I can get to making traceroute behave like the simplified diagram above is to send only one packet per TTL (
-q option) and disable looking up symbolic names (
$ time traceroute -q 1 -n steampowered.com traceroute to steampowered.com (22.214.171.124), 64 hops max, 52 byte packets 1 10.0.0.1 4.793 ms 2 126.96.36.199 7.811 ms 3 188.8.131.52 5.402 ms 4 184.108.40.206 5.815 ms 5 220.127.116.11 9.054 ms 6 18.104.22.168 6.528 ms 7 22.214.171.124 7.988 ms 8 126.96.36.199 9.635 ms traceroute -n -q 1 steampowered.com 0.00s user 0.00s system 4% cpu 0.100 total
The total time here 100 is milliseconds with 57 milliseconds being accounted for by the time spent waiting for the responses. That is fast enough for human consumption.
It might be possible to speed up traceroute by sending out many probes with different TTLs simultaneously. The ICMP response includes part of the original UDP packet making it possible to identify responses even if they return out of order if you include something unique in the original packet (like a unique destination port). The manual mentions that “Some systems such as Solaris and routers such as Ciscos rate limit ICMP messages.”. I sent 30 packets with TTLs between 1-30 simultaneously as a test and it mostly worked. However I only received a few responses for messages with a high enough TTL to reach the final destination which could be from rate-limiting.
Another big slowdown I experienced was the timeout when a hop returns no response. In that case traceroute waits
number_of_probes × timeout before trying the next TTL. With the default settings the wait is 15 seconds (3 probes × 5 seconds).
There is a useful tool mtr which is like a combination of traceroute and ping. It first gets ICMP responses like traceroute. Then it continually pings each device to display ongoing statistics. Unlike traceroute it will not find multiple IP addresses responding to the same TTL. I was happy to see that mtr is much faster that traceroute when a hop does not return anything.
I did experience something odd in mtr for hops where traceroute showed multiple IP addresses (e.g. #5 in the first example). For those hops mtr sometimes reports high rates of packet loss even though packet loss for the final destination is very low. This isn’t a bug with mtr but instead says something about the underlying route!